Introduction to RAG (Retrieval-Augmented Generation)
RAG (Retrieval-Augmented Generation) is an innovative approach that combines the strengths of information retrieval and generative models to enhance the way we access and use information.
By integrating a retrieval mechanism, RAG allows large language models to access and utilize external documents or knowledge bases, providing more accurate and contextually relevant responses.
How RAG Works
- 🔎 Finding Information: When a query is made, RAG searches through a large collection of documents, databases, or online sources to find the most relevant and useful data.
- 🖋️ Generating Responses: Using the retrieved information, RAG generates clear and accurate responses tailored to the specific query.
Key concepts
Glossary
- Large Langage Model (LLM): A type of artificial neural networks (mostly autoregressive attention-based networks, aka Transformers) trained on trillons of documents to predict the most likely next word. ex:
{prompt='The capital of France is...', answer: 'Paris'}.
Recursive calls of these models can generate sentences. These sentences are contextualised by the provided prompt. - EmbeddingsModel: A subtype of LLM trained to generate vector representations of text.
- Embeddings: The vector representation of text or documents
- Vector Database: These databases enables similarity search (approximate match) that are particularly convinient to retrieve relevant and contextual information. They are fed by the vectors generated by embeddings models
- Similarity search: Retrieve the k closest documents to the input query. This retrieval is based on a distance (mostly cosine distance or dot product)
Most of RAG use cases implements two different but complementary workflows
Phase 1: Document vectorisation
This workflow aims to populate the document database with the appropriate data (vectors)
graph LR
rawdoc[Raw Document] -->|document transformer| doc[Document] -->|EmbeddingsModel| vec[Document Vector] --> db[(Vector Database)]
In RAG-Core this phase can be handled by EmbeddingsPipeline class.
Phase 2: Answer generation
This workflows produce, from a user input, a textual content (answer, document, code etc.)
graph LR
query[User query] --> |EmbeddingsModel| qv[Query Vector] --> db[(Vector Database)] --> docs[Retrieved documents]
docs --> prompt[Augmented prompt]
query --> prompt
prompt -->|LLM| anwer[Answer]
In RAG-Core this phase can be handled by RAG class
Key Benefits of RAG
- Accurate Answers: Combines information retrieval with content generation to provide precise and contextually appropriate answers.
- Efficiency: Reduces the time and effort required for manual searches and data compilation by quickly finding and synthesizing information.
- Enhanced Decision Making: Offers detailed insights and comprehensive answers, aiding in better-informed decision-making processes.
Some Application Exmaples
- ✨ Content Suggestion: Assist in creating high-quality content creator with real time suggestions based on proprietary contents
- ⚗️ Insight Research: Gathers and synthesizes information from various sources, providing detailed reports and analysis.
- 💬 Open Q&A: Extend traditional FAQ with open question fields answering most of your users concerns
- 🙋 Customer Support: Improves customer service by providing accurate and timely responses to customer queries.