Introduction to RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation) is an innovative approach that combines the strengths of information retrieval and generative models to enhance the way we access and use information.
By integrating a retrieval mechanism, RAG allows large language models to access and utilize external documents or knowledge bases, providing more accurate and contextually relevant responses.

How RAG Works

🔎 Finding Information: When a query is made, RAG searches through a large collection of documents, databases, or online sources to find the most relevant and useful data.
🖋️ Generating Responses: Using the retrieved information, RAG generates clear and accurate responses tailored to the specific query.

Key concepts

Glossary

Large Langage Model (LLM): A type of artificial neural networks (mostly autoregressive attention-based networks, aka Transformers) trained on trillons of documents to predict the most likely next word. ex: {prompt='The capital of France is...', answer: 'Paris'}.
Recursive calls of these models can generate sentences. These sentences are contextualised by the provided prompt.
EmbeddingsModel: A subtype of LLM trained to generate vector representations of text.
Embeddings: The vector representation of text or documents
Vector Database: These databases enables similarity search (approximate match) that are particularly convinient to retrieve relevant and contextual information. They are fed by the vectors generated by embeddings models
Similarity search: Retrieve the k closest documents to the input query. This retrieval is based on a distance (mostly cosine distance or dot product)

Most of RAG use cases implements two different but complementary workflows

Phase 1: Document vectorisation

This workflow aims to populate the document database with the appropriate data (vectors)

graph LR
rawdoc[Raw Document] -->|document transformer| doc[Document] -->|EmbeddingsModel| vec[Document Vector] --> db[(Vector Database)]

In RAG-Core this phase can be handled by EmbeddingsPipeline class.

Phase 2: Answer generation

This workflows produce, from a user input, a textual content (answer, document, code etc.)

graph LR
query[User query] --> |EmbeddingsModel| qv[Query Vector] --> db[(Vector Database)] --> docs[Retrieved documents]
docs --> prompt[Augmented prompt]
query --> prompt
prompt -->|LLM| anwer[Answer]

In RAG-Core this phase can be handled by RAG class

Key Benefits of RAG

Accurate Answers: Combines information retrieval with content generation to provide precise and contextually appropriate answers.
Efficiency: Reduces the time and effort required for manual searches and data compilation by quickly finding and synthesizing information.
Enhanced Decision Making: Offers detailed insights and comprehensive answers, aiding in better-informed decision-making processes.

Some Application Exmaples

✨ Content Suggestion: Assist in creating high-quality content creator with real time suggestions based on proprietary contents
⚗️ Insight Research: Gathers and synthesizes information from various sources, providing detailed reports and analysis.
💬 Open Q&A: Extend traditional FAQ with open question fields answering most of your users concerns
🙋 Customer Support: Improves customer service by providing accurate and timely responses to customer queries.