Intro to RAG
Intro to RAG: Foundations of Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) may sound complex, but it accurately represents the process of the system. RAG is a method that enhances the capabilities of Large Language Models (LLMs) by integrating them with external knowledge sources.
Each term represents a piece of the puzzle:
- Retrieval - data retrieved from some external source outside the LLM (most often a database, but can include files, webpages, etc).
- Augmented - "augmenting" (or adding to) an LLM’s training data, usually by adding recent or private information that it did not have access to during its training period.
- Generation - the generation of responses (text, image, video, etc) similar to the data being provided in the input.
Instead of relying solely on the model’s internal training data, RAG retrieves relevant information from databases or document collections to ground its responses in factual and up-to-date content. This method improves the accuracy and reliability of the generated outputs and adapts to specific contexts or domains.
In this post, we will define each component and how it works together in a RAG system.
Why RAG?
Retrieval augmented generation (RAG) solves a few different problems in the technical space:
- It dynamically adds data/information to (augment) an LLM’s knowledge, improving relevance and accuracy.
- It provides searchable access for various data types - database types, text, images, audio, video, webpages, etc.
- It allows technical experts to guide or limit the AI with defined tools, high-quality data, rules, business logic, increasing accuracy and reducing risk.
Large Language Models (LLMs)
Part of the Generative AI field, Large Language Models (LLMs) generate content by predicting probabilities of sequences within their input data. They excel at understanding context and producing coherent outputs but also have limitations, such as generating inconsistent responses when there are multiple potential answers.
Note:
Each LLM is trained differently, prioritizing certain probabilities over others to optimize for specific goals. This is why every LLM may produce different outputs for the same input.
Many LLMs tend to hallucinate (or produce inaccurate answers) due to:
- Searching for answers related to recent or private data outside their training.
- Prompts that traverse gaps in the LLM's knowledge.
- Lack of context to guide answers, leading to uncertainty.
Vector embeddings
A vector is a mathematical representation with size (magnitude) and direction. Vectors are used in various relatable use cases, including:
Airplane flight paths: Representing paths in 3D space and calculating changes needed due to obstacles.
Rocket trajectories: Navigating multi-dimensional space and avoiding celestial obstacles.
Vectors applied to words
In 2013, Google's word2vec created numeric representations of words, allowing for the comparison of similar words. This ability is foundational for many natural language processing tasks.
Vectors applied to data
Vector embeddings are numerical representations capturing semantic meaning, enabling semantic searches that go beyond keyword matching. They allow for precise identification and retrieval of information.
Similarity search
Similarity search involves finding data records that resemble a given query, commonly using techniques like k-Nearest Neighbors (k-NN) or approximate methods for efficiency.
To measure similarity, common metrics include:
- Cosine similarity: Measures the angle between vectors.
- Euclidean distance: Shortest distance between points.
Note:
Different models will yield different vector embeddings, affecting similarity search results.