Skip to content

Quickstart: Your first RAG

You will learn how to build a small RAG application using RAG Core by :

  1. Setting up your environment
  2. Creating queryable document store
  3. Serving the api/docs
  4. Adding Question & Answer capacities

Step 1: Setup your environment

rag.py will contain most of your code. Create it.

mkdir my_app
touch my_app/rag.py

When running this example you'll need to set some environment vars to display debug and serve static resources.

If you don't have a .env file, create one with this content :

LOG_LEVEL=DEBUG
STATIC_FILE_URL=/static
DOCS_URL=/docs

To know what they're doing, refer to the environment variables documentation.

Step 2: Build a Queryable Document Resource

a. Initialize your document resource

The first step is to convert your documents into queryable vectors.

Let's define a Resource to store this vectorized documents.

In RAG-Core, a Resource represents a named entity containing retrievable documents to be injected into prompts.

import os

from langchain_openai import OpenAIEmbeddings

from rag.resources.local import LocalVectorResource

embeddings_model = OpenAIEmbeddings(
    openai_api_key=os.getenv("OPENAI_API_KEY"), model="text-embedding-3-small"
)

resource = LocalVectorResource(
    name="local_resource",
    index_path="local_index",
    persistent=True,
    embeddings_model=embeddings_model,
)
Developer tip

We don't know what you will have as data, so we left out some configuration here.

One of the most important configuration, if you have a process that will update your documents, is to provide the metadata field name that contain your document unique ID to prevent data duplication inside your vector database.

If your document unique id is stored inside $.metadata.udid, pass "udid" to the parameter document_metadata_id

resource = LocalVectorResource(
    name="local_resource",
    index_path="local_index",
    persistent=True,
    embeddings_model=embeddings_model,
    document_metadata_id="udid"
)
About LocalVectorResource

This class use the InMemoryVectorStore which stores vectors in a dict. It is the lightest vector store currently available.
InMemoryVectorStore is not intended for production purposes, neither is LocalVectorResource**.

It should be used in and limited to educational or demonstration use cases.

In InMemoryVectorStore, vectors are not compressed and are stored and persisted locally. Therefore, we strongly recommend using it with small document datasets (fewer than 1,000 documents).


What about production then ?
Production use case shall prefer remote vector storage such as PineconeResource or local index (dedicated to vector storage) such as FAISSResource.

b. Create an EmbeddingsPipeline

Now we build a pipeline to vectorize our documents and store them in our resource.
An EmbeddingsPipeline allows you to sequentially apply a list of document transformers to preprocess the data and conclude the sequence with an embeddings model resulting in the addition of vectorized documents in your Resource.

Lets start with a simple example with no custom document preprossing.

from rag.core.embeddings import EmbeddingsPipeline

embeddings_pipeline = EmbeddingsPipeline(
    name="Quick Start Embeddings", 
    vectorstore=resource
)

Step 3: Serve Your App in the API

RAG-Core includes an automatically generated API.

This API is automatically documented in the OpenAPI standards enabling developers to understand your application and build upon it.

Let's register our brand new EmbeddingsPipeline:

a. Create an API configuration file

touch my_app/api.yaml

b. Configure your API

In api.yamladd:

info:
  name: "Your API name"
  description_path: "my_app/api_description.md" # Update the API description with your needs
  contact:
    name: "John Doe" # You
    email: "john.doe@infopro-digital.com" # Your mail
services:
  - name: "Quick Start Embeddings" # Your EmbeddingsPipeline name
    description: "Document embeddings service for a quick start"
    api_prefix: "quickstart_embeddings" # http://{api_url}/quickstart_embeddings/
    module_path: "my_app/rag.py" # The path to your app file
  • The info section contains generic information displayed your OpenAPI documations homepage (accessible by default at http://0.0.0.0:8000/docs)
  • The services section can contain several applications. All provided services will be automatically loaded and accessible through the same API client.

Info

module_path expects the file path or the folder path (aka python module) where your RAG and EmbeddingsPipeline are instantiated.

c. Add your config to your environment variable

The API client detects the configuration by reading the API_CONFIG_PATH environment variable. It's default value is app/api.yaml. To load your newly created my_app, set it to you api.yaml file path :

export API_CONFIG_PATH="my_app/api.yaml"

That's it. The API module can build your custom service.

Environment variable management

Add this path to your .env file to persist this information for futher use.

d. Start the api

Simply run:

make api
Developer tip

Set in your .env file AUTO_RELOAD=True and LOG_LEVEL=DEBUG to enable logs and autoreload. In this mode, any change in your code is immediatly applied to the running server.

Access your API documentation at 0.0.0.0:8000/docs

Your embeddings service should appears like this.

RAG-CORE Embeddings API

👉 Add a few documents using the embedding /batch endpoint.

Note

The page_content and meta are standard fields for Langchain Documents. Just add your content inside page_content.

You can insert more metadata if you want, but anything outside these two fields are discarded.

Step 4: Add Generative Question Answering Capabilities

RAG-Core is compatible with any langchain chains (aka Runnable). It means endless customisation possibilities.

RAG-Core also provides a few built-in chains (in rag.generation module) but you are likely to endup tuning your own prompt and deep-dive in Langchain abstractions to fit with your particular use case.

Let's build our own answer generation chain.

a. Build your custom generation chain

In my_app/rag.py:

from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo")
prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
                "You are a helpful assistant that provide answers based on a list of document. If you don't know the answer, just say that you don't know"
            )
        ),
        HumanMessagePromptTemplate.from_template(
            " Context:\n {context}\n\n Question:\n {question}"
        ),
    ]
)

chain = llm | prompt

Spend time to explore Lanchain

If you are not familiar with Langchain and its LCEL numerous examples are detailed in their documentation (Langchain Getting Started).
We strongly suggest you to have a look 👀 before jumping into your own chain customisation.

❗️ Important Note: RAG-Core will inject your documents with the keyword argument context and the user input with the keyword question.
Your prompt template must use these keywords to leverage both retrieved documents and the user question in its generated answer.

b. Plug your chain with your resources

A RAG object plug a list of Resource to your chain

from rag.core.generator import RAG

rag = RAG(
    name = "Quick Start RAG",
    resources=[resource],
    generation_chain=chain
)

Documents aggregation

Under the hood the RAG class transform your resources into Langchain Retrievers.

At inference time all retrievers are queried and the documents collected are numbered and aggregated. These documents are then injected in your chain via the keyword context of your prompt template. This mechanism allows any chain to seamlessly consume documents from an arbitrary amount of Resource types.

This retrieval logic can be customized by creating a custom class inheriting from RAG

c. Register your RAG in the API

in api.yaml add in services section:

services:
  - name: "Quick Start RAG"
    api_prefix: "quickstart_rag"
    description: "Question Answering service for a quick start"
    module_path: "my_app/rag.py" # The path to your app file

d. Check your program

Our new RAG application has been automatically added to the API.
Refresh your browser and ...

RAG-CORE RAG API
Tada

✔️ 3 new endpoints have been added:

  • /invoke: Regular REST endpoind, POST a question, the API replies with your generated answer and the additionnal informations provided by the generation chain (e.g. references, ids etc.)
  • /batch: Concurrent invocation to process simultaneously several user inputs
  • /stream: Similar to /invoke but answer is returned in streamed chunks

You can test your RAG application within the API documentation by clicking "Try it out". Obviously you have already used the embedding endpoint to insert data inside your Vector store.


The end

🎉 Congrats!
You've implemented your first production-ready RAG application!

If you are interested in exploring more advanced example:
Check out USN Q&A Example


Thanks for reading until this part. We hope you've enjoyed it!