Resource

A Resource is an Abstract Pydantic BaseModel^doc representing a named object dedicated to the retrieval of documents.

It exposes a single property retriever returning a Langchain BaseRetriever^ref

Basically a Resourcecan be anything with a name and a retriever.

Why using Pydantic?

Pydantic allows for type checking and schema validation. This feature ensures that users receive quick feedback if their implementation does not fit the framework requirements, preventing potential issues with other built-in features, such as automatic API generation.

VectorResource

A VectorResource extends Resource mostly by adding a vectorstore attribute (extending VectorStore^ref)

Info

Most VectorResource requires an embeddings_model (extending Embeddings^ref) as input argument.
It is used to initialize Langchain's VectorStore.

Add documents

It also provides a set of abstract methods facilitating document injestion:

add_documents: Takes a list of documents as inputs, embed them with the embeddings_model and store them in the vectorstore

Source

@abstractmethod def add_documents(self, documents: List[Document]) -> None: """Embed documents and add them to the Langchain VectorStore object.""" pass

aadd_documents: Similar to add_documents but asynchronous

Source

@abstractmethod async def aadd_documents(self, documents: List[Document]) -> None: """Embed documents and add them to the Langchain VectorStore object asynchronously.""" pass

Example

# Create a document to add to the vector store
document_text = "This is a sample document to be added to the vector store."
document_metadata = {"title": "Sample Document", "author": "Author Name"}
document = Document(page_content=document_text, metadata=document_metadata)

# Add the document to the vector store
vector_store.add_documents([document])

Add texts

Utility function, mostly for educational purpose, wraping the provided texts in a List[Document] and embedding them via add_documents method

Example

# Create a document to add to the vector store
text = "This is a sample document to be added to the vector store."
# Add the document to the vector store
vector_store.add_texts([text])

Also available in Asynchronous mode via .aadd_texts

# Create a document to add to the vector store
text = "This is a sample document to be added to the vector store."
# Add the document to the vector store
await vector_store.aadd_texts([text])

Retriever

RAG-Core allows querying multiple Resources simultaneously.
In order to preserve the information about the origin of the retrieved document, the VectorResource retriever automatically adds the name of the VectorResource that provided the documents to the output dictionary

To do so the default VectorResource retriever pipes langchain's retrived with a RunnableLambda (Aka: custom python function) extending the retieved documents metadata with a source item.

Source

@property
def retriever(self) -> VectorStoreRetriever:
    assert self.vectorstore is not None, "VectorStore is not initialized"
    return self.vectorstore.as_retriever() | RunnableLambda(
        self.add_source_in_retrived_documents_metadata
    )

def add_source_in_retrived_documents_metadata(
    self, documents: List[Document]
) -> List[Document]:
    for doc in documents:
        doc.metadata["source"] = self.__class__.__name__
    return documents

Custom Resource

Basic example

You can create your own Resource pretty easily by extending the associated class.

from typing import List

from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document

from langchain_core.retrievers import BaseRetriever

class UniversalRetriever(BaseRetriever):
    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
    return [Document(page_content="The answer is 42")]


class CustomResource:
    def get_retriever(self, *args, **kwargs):
        return UniversalRetriever()

Custom Retriver

In some cases you might to build a custom retriever. This retriever must fits langchain's requirements. Custom Retriver^doc

Integrating new VectorStore

Here is an example showing how to add Postgres Vector resource.

Warning

It is an illustrative example and it not expected to work as is. It could requires a few tests and adjustments.

In rag/resources add a new file (for instance pgvector.py)

from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_postgres.vectorstores import PGVector
from pydantic import Field

from rag.core.resource import VectorResource

class PGVectorResource(VectorResource):
    connection: str = Field(examples=["postgresql+psycopg://langchain:langchain@localhost:6024/langchain"]) 
    collection_name: str
    embeddings_model: Embeddings
    vectorstore: Optional[PGVector] = None

    def _initiate_vectorstore(self) -> None:
        self.vectorstore = PGVector(
            embeddings=embeddings,
            collection_name=collection_name,
            connection=connection,
            use_jsonb=True,
    )

    def add_documents(self, documents: List[Document]) -> None:
        self.vectorstore.add_documents(documents)

    async def aadd_documents(self, documents: List[Document]) -> None:
        await self.vectorstore.aadd_documents(documents)

Do not forget to add the possible additional depandancies

poetry add langchain_postgres

And please, consider updating the documentation and the tests accordingly