LLM Hallucinations Are Inevitable. Limiting Them Requires a Trusted Data Foundation.

Businesses must address the risks of AI hallucinations by using LLMs appropriately, implementing strict data validation, and adopting advanced data unification technologies, says Ansh Kanwar, EVP of product, technology & strategy at Reltio.

6 Min Read
LLM on wooden blocks
Alamy

While there are many clever examples of users provoking AI chatbots into providing humorous or nonsensical answers, real-world consequences of AI hallucinations are no laughing matter. Examples include made-up historical references such as a fictional meeting between James Joyce and Vladimir Lenin, a travel chatbot creating a pretend ice cream shop in Pittsburgh, and lawyers citing non-existent case law.

While LLM hallucinations pose risks to businesses and create mistrust among consumers, eliminating them entirely is impossible with today’s technology. While LLMs may seem intelligent, they are trained on stochastic models to predict the next word based on patterns in their training data, not on a true understanding of facts. LLMs generate confident-sounding statements that seem realistic but could be wholly fabricated, like the Joyce-Lenin meeting that even had small made-up details such as the context, location, time, and place of their fictional meeting. LLMs can also make predictions from erroneous bad data and faulty inputs, leading to the inevitability of mistakes.  Further compounding these problems is the inability of LLMs to fact-check themselves – they simply cannot tell users if they are providing trusted information, leaving it up to humans to judge the outputs.

Related:How to Monitor AI with AI

So, what can businesses and companies do to address these problems? A great place to start for companies seeking to implement LLMs is to ensure they are used in appropriate use cases. For example, it’s probably unwise for scientific research, but okay to augment creativity, summarize complex text, translate text, and support customer service. To minimize the amount of fictional information, businesses should employ robust data unification frameworks and technology, ensuring data quality, connectivity, and trustworthiness. This is particularly crucial for LLMs and generative AI models, which rely on real-time data input to produce dynamic and personalized outputs. By implementing strict data validation processes, regularly auditing AI systems for accuracy and biases, and leveraging advanced data unification solutions, companies can create a solid foundation for responsible AI deployment.

Tailoring LLMs to Your Business With Company-Specific Knowledge

As enterprises increasingly rely on LLMs to enhance their operations and customer interactions, it is crucial to address the mistrust arising from these hallucinations. The first step for enterprises is to address model training and introduce sources to produce company-specific responses.

Related:AI Basics: A Quick Reference Guide for IT Professionals

LLMs are foundational models trained on vast amounts of static, pre-collected text data. While this allows them to develop a broad understanding of language, their responses are often generic and lacking in a company-specific context. However, enterprises can augment LLMs to provide tailored, domain-specific responses using techniques like Retrieval-Augmented Generation (RAG) and graph augmentation.

RAG involves creating a searchable database of company documents that the LLM can retrieve and use to inform its responses. The key to successful RAG implementation is ensuring the data feeding the retrieval system is of the highest fidelity, meaning data that is accurate, complete, consistent, timely, and relevant for its intended purpose. This requires enterprises to invest in robust data management and unification solutions, including real-time updating and quality control capabilities. By maintaining a clean, comprehensive, and up-to-date corpus of company information, enterprises can ensure the LLM has access to accurate, relevant context when generating responses.

Graph augmentation, on the other hand, involves constructing a structured knowledge graph of company-specific entities and relationships. LLMs can be trained to traverse this graph during inference, allowing them to incorporate company-specific facts and terminology into their responses. As with RAG, the effectiveness of graph augmentation depends on the quality of the underlying data. Enterprises must invest in building and maintaining a well-structured, accurate, and complete knowledge graph that reflects the company's current state.

The connection between LLMs and enterprise-specific data management is critical in both cases. By ensuring the data feeding these augmentation techniques is of the highest fidelity, enterprises can unlock the full potential of LLMs to provide accurate, contextually relevant, and company-specific responses. This requires a commitment to ongoing data governance, quality assurance, and maintenance to keep the LLM's knowledge base up-to-date and aligned with the evolving needs of the enterprise.

Feeding LLMs With Trusted, Unified Data

Modern cloud-native data unification and management approaches are essential tools to address the challenges of training LLMs. These approaches include entity resolution, master data management, and data products, collectively serving as the central nervous system for business-critical core data. By unifying data from disparate sources and feeding it to LLMs in a consistent, accurate, and real-time manner, these systems help close the trust gap and ensure the models produce reliable outputs.

One key aspect of a robust data unification and management solution is canonical data models. These models provide the flexibility to seamlessly unify data from various sources across different entity types, ensuring consistency and accuracy. Scalability is another critical feature, as the amount of data generated and consumed grows unprecedentedly. A cloud-native, scalable architecture is essential to handle this continuous growth and ensure LLMs can access the most up-to-date information.

In addition to scalability, API-first performance is crucial for real-time data availability and automation. APIs can enable seamless integration between the data unification platform and the LLMs, allowing rapid data access and processing. This real-time data availability ensures that the LLMs always work with the most current and accurate information, reducing the risk of generating outdated or inconsistent content.

The importance of data unification and management tools in LLM model usage cannot be overstated. These solutions ensure that LLMs are fed with accurate, real-time information by providing trusted, unified data through canonical data models, scalable architectures, API-first performance, easy configuration, and security compliance.  These powerful models allow organizations to harness their full potential and build trust in their content.

Lay the Data Foundation Now To Reap Future LLM Rewards

As AI progresses and becomes more integrated into our daily lives, the need for trusted, high-fidelity data to feed these models has never been more critical. While today’s LLMs are limited, they are still incredibly useful, offering amazing potential for businesses and consumers.

The cost of ignoring the need for clean, trusted data is high. In addition to the opportunity cost of not delivering best-in-class customer experiences, businesses also face the risk of significant regulatory fines and reputational damage. For example, the European Union's General Data Protection Regulation (GDPR) imposes fines of up to €20 million or 4% of a company's global annual revenue for data privacy violations. Moreover, the erosion of consumer trust resulting from biased or misleading AI outputs can have long-lasting effects on a brand's reputation and bottom line.

In today’s data-driven world, AI is mandatory to stay competitive. But artificial intelligence, including LLMs, is only as strong as its data foundation. Modern data unification and management uniquely satisfy the needs of enterprise AI by curating complete, timely core data at scale. Treat your organization’s data as a strategic asset and invest in modern tools needed to fuel AI with trusted data. Lay the data groundwork now to reap AI rewards for years to come.

About the Author

Anshuman (“Ansh”) Kanwar is Senior Vice President of Technology at Reltio. He leads the building and testing of Reltio’s multi-cloud core platform, as well as data domain-specific solutions. He is also responsible for critical non-functional capabilities, such as scale and performance.

Ansh has extensive experience in product management, agile software development, security, cloud computing, data center and network infrastructure, and DevOps-at-scale. Over the last 20 years, he has held numerous senior technical and management roles, including at Citrix Systems, where he served as Vice President for technology operations, and LogMeIn, where he served as Chief Technology Officer. Recently, he was the GM, products and technology, at Onapsis. Ansh has a bachelor’s in computer engineering from Delhi University, an MS in electrical and computer science from the University of California, Santa Barbara, and an MBA from the MIT Sloan School of Management.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like