Grounding Language Models with Vector Search: Enhancing Accuracy and Control

TL;DR:

  • Large language models (LLMs) have limitations in generating accurate responses and can produce hallucinations.
  • LLMs are neural networks trained on text databases and use vector embeddings for text representation.
  • Microsoft’s Semantic Kernel provides a framework for adding vector search to LLMs, enabling grounding in specific data sources.
  • Vector search allows for constraining LLM output by querying stored embeddings and producing semantically correct responses.
  • Embeddings are essential tools for working with LLMs, offering various applications such as text classification and generation.
  • Tools like LangChain and Semantic Kernel simplify the integration of LLMs, vector search, and application APIs.
  • Grounding LLMs in business data sources enhances confidence in outputs and enables self-service support chatbots.

Main AI News:

Large language models have proven their capabilities, but they also come with limitations. We’ve witnessed instances of models like Chat GPT fabricating legal cases and generating nonsensical responses. These shortcomings stem from the nature of transformer models, which are neural networks trained on extensive databases of text. However, this text isn’t in standard English or other languages.

To understand why these models behave as they do, we need to delve into their workings. The training process involves breaking down the text into syllable chains, which are then converted into vector representations known as “embeddings.” These embeddings reside in a multidimensional semantic space, creating a framework for the model to construct paths probabilistically. These paths manifest as text, images, or even speech.

The Role of Large Language Models

In essence, a large language model (LLM) serves as a “Text Completer.” It predicts the most probable chain of syllables following a given prompt. It traverses a multidimensional semantic space, generating fresh output tokens along the path. However, as the path extends, the output becomes less connected to the initial prompt, resulting in what is known as “hallucinations.” These outputs are semantically correct but lack grounding in reality.

It’s important to note that these hallucinations don’t indicate a flaw in the underlying LLM. Rather, they reflect the model’s operation within an unbounded semantic space, lacking a mechanism to keep its output grounded. This is why ChatGPT produces more accurate responses when paired with plugins or when Microsoft’s Copilots integrate with domain-specific knowledge.

Mitigating Errors with Constraints

To mitigate the risks associated with generative AI, it’s possible to reduce the space within which an LLM operates. This can be achieved by restricting the model to specific texts, such as a catalog or a company document store. Another approach is to tie the outputs to prompts generated by a particular service or application.

Both options require prompts that are in a vector format similar to the LLM’s embeddings. These prompts can be in the form of a vector database or a vector search over an existing corpus. Microsoft’s Prometheus model, which wraps GPT 4, exemplifies this approach by utilizing Bing as the data source.

Grounding LLMs with Vector Search

So, how can we ground an LLM effectively?

Microsoft’s Semantic Kernel tooling offers valuable insights into incorporating vector search into models. It provides a pipeline-based workflow that integrates LLMs from OpenAI’s GPT and transformer models from Hugging Face, as well as Microsoft’s Azure OpenAI service. This extensibility framework enables the use of LLMs within application workflows using familiar programming languages and tools.

Semantic Kernel leverages vector search and vector databases to introduce “semantic memory” to AI applications, alongside traditional API calls to various services. To work with complex texts, vector embeddings are required, which can be stored in vector search index tables within suitable databases or specialized vector databases.

Harnessing Text Embeddings

Most LLMs offer their own embedding tools to convert strings into embeddings. This allows for quick integration of the embeddings as index terms for strings in your own data stores. OpenAI currently recommends using the text-embedding-ada-002 model due to its economic efficiency and improved accuracy compared to its predecessors. It estimates that a standard document contains approximately 800 tokens per page.

Embeddings serve multiple purposes when working with LLMs. They facilitate tasks like text classification, summarization, translation management, and even text generation. Leveraging stored content, applications can generate outputs based on these embeddings, thus controlling the LLM’s output.

Controlling LLM Output with Vector Search

To maintain control over LLM output, it’s crucial to rely on your own data sets. Rather than relying solely on LLM training data, you can use your own data to form the core content of an answer. The LLM then acts as a semantic wrapper for the answer, providing accurate context. Constructing an appropriate prompt based on the original query and embedding data is essential, and tools like LangChain and Semantic Kernel can facilitate this orchestration.

An intriguing aspect arises when the base documents are self-labeled, such as Microsoft’s Office Open XML or the OpenDocument format. Embedding these formats captures the semantic structure of both document layouts and text. Docugami, a document automation specialist, transforms documents into its own Document XML Knowledge Graph. These embeddings are stored in Redis VectorDB, enabling chat sessions through an LLM to extract information from collections of business documents, grounding interactions in your own data.

Building on the Foundation

To obtain accurate responses from an LLM, it’s crucial to remember that the model itself is a foundation. Similar to constructing a home, achieving desirable results requires building upon that foundation. Tools like Semantic Kernel or LangChain can orchestrate LLMs, vector search, and application APIs, allowing you to enhance and customize the model’s performance.

By grounding an LLM in your own data sources, you gain confidence in the outputs. This is particularly valuable when dealing with prompts that might not have an explicit answer.

Conclusion:

The integration of vector search with large language models presents significant implications for the market. Businesses can now enhance the accuracy and control of their AI applications by grounding LLMs in specific data sources. This approach allows for more reliable responses, improved customer interactions, and the development of self-service support solutions. By leveraging vector search and embeddings, organizations can harness the power of language models while ensuring alignment with their own data, leading to more confident and valuable outputs in various business scenarios.

Source