Resource
Resource
A Resource is an Abstract Pydantic BaseModeldoc representing a named object dedicated to the retrieval of documents.
It exposes a single property retriever returning a Langchain BaseRetrieverref
Basically a Resourcecan be anything with a name and a retriever.
Why using Pydantic?
Pydantic allows for type checking and schema validation. This feature ensures that users receive quick feedback if their implementation does not fit the framework requirements, preventing potential issues with other built-in features, such as automatic API generation.
VectorResource
A VectorResource extends Resource mostly by adding a vectorstore attribute (extending VectorStoreref)
Info
Most VectorResource requires an embeddings_model (extending Embeddingsref) as input argument.
It is used to initialize Langchain's VectorStore.
Add documents
It also provides a set of abstract methods facilitating document injestion:
add_documents: Takes a list of documents as inputs, embed them with theembeddings_modeland store them in thevectorstore
Source
@abstractmethod def add_documents(self, documents: List[Document]) -> None: """Embed documents and add them to the Langchain VectorStore object.""" pass
aadd_documents: Similar toadd_documentsbut asynchronous
Source
@abstractmethod async def aadd_documents(self, documents: List[Document]) -> None: """Embed documents and add them to the Langchain VectorStore object asynchronously.""" pass
Example
# Create a document to add to the vector store
document_text = "This is a sample document to be added to the vector store."
document_metadata = {"title": "Sample Document", "author": "Author Name"}
document = Document(page_content=document_text, metadata=document_metadata)
# Add the document to the vector store
vector_store.add_documents([document])
Add texts
Utility function, mostly for educational purpose, wraping the provided texts in a List[Document] and embedding them via add_documents method
Example
# Create a document to add to the vector store
text = "This is a sample document to be added to the vector store."
# Add the document to the vector store
vector_store.add_texts([text])
Also available in Asynchronous mode via .aadd_texts
# Create a document to add to the vector store
text = "This is a sample document to be added to the vector store."
# Add the document to the vector store
await vector_store.aadd_texts([text])
Retriever
RAG-Core allows querying multiple Resources simultaneously.
In order to preserve the information about the origin of the retrieved document, the VectorResource retriever automatically adds the name of the VectorResource that provided the documents to the output dictionary
To do so the default VectorResource retriever pipes langchain's retrived with a RunnableLambda (Aka: custom python function) extending the retieved documents metadata with a source item.
Source
@property
def retriever(self) -> VectorStoreRetriever:
assert self.vectorstore is not None, "VectorStore is not initialized"
return self.vectorstore.as_retriever() | RunnableLambda(
self.add_source_in_retrived_documents_metadata
)
def add_source_in_retrived_documents_metadata(
self, documents: List[Document]
) -> List[Document]:
for doc in documents:
doc.metadata["source"] = self.__class__.__name__
return documents
Custom Resource
Basic example
You can create your own Resource pretty easily by extending the associated class.
from typing import List
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
class UniversalRetriever(BaseRetriever):
def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
return [Document(page_content="The answer is 42")]
class CustomResource:
def get_retriever(self, *args, **kwargs):
return UniversalRetriever()
Custom Retriver
In some cases you might to build a custom retriever. This retriever must fits langchain's requirements. Custom Retriverdoc
Integrating new VectorStore
Here is an example showing how to add Postgres Vector resource.
Warning
It is an illustrative example and it not expected to work as is. It could requires a few tests and adjustments.
In rag/resources add a new file (for instance pgvector.py)
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_postgres.vectorstores import PGVector
from pydantic import Field
from rag.core.resource import VectorResource
class PGVectorResource(VectorResource):
connection: str = Field(examples=["postgresql+psycopg://langchain:langchain@localhost:6024/langchain"])
collection_name: str
embeddings_model: Embeddings
vectorstore: Optional[PGVector] = None
def _initiate_vectorstore(self) -> None:
self.vectorstore = PGVector(
embeddings=embeddings,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
def add_documents(self, documents: List[Document]) -> None:
self.vectorstore.add_documents(documents)
async def aadd_documents(self, documents: List[Document]) -> None:
await self.vectorstore.aadd_documents(documents)