Front end integration

When using the RAG application through a JS Client there's two way to display the response :

Wait for the full response to be available, with the RAG invoke endpoint
Display information as soon as they are available, using the RAG stream endpoint

There are advantages and disadvantages to both methods.

Non streamed response

Advantages: Simple to implement, use slightly less resources

Disavantages: Less user friendly

The first is the simplest to implement, it's less resource hungry but it lack interactivity for the front user.

Once your user submit his question you'll usually display a loading animation, wait for the full response and then display it before removing the animation. If it's what you need, then here the simpliest JS code to query the invoke endpoint.

Just query the endpoint like you always do and display the response.

const response = await fetch('/rag/invoke', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
    },
    body: JSON.stringify({ question: inputValue }),
});
const data = await response.json();
onResponse(data); // Function reponsible for your UI update

For the data format, see the API endpoint.

Streamed response

Advantages: User feel that the system is really processing his question.

Disadvantages: Heavier to implement, use slightly more resources and impact the page redraw.

When using this query mode, once an information have been computed it'll be sent to your client.

But ... there's a catch, every time you receive a chunk it'll contain incomplete and fragmented information.

To make it easier to process information, every time you receive a chunck it'll be a valid JSON object containing the information that are added to the complete object.

Let's say you ask the question: "What is the sky color ? Give me a brief explanation of one sentence." You'll need to either process them individually or merge every chunk together.

This one is trickier to implement, but is the best way to give your user the sense of interactivity they are used to with all the GPT's clients.

What you must understand when dealing with async stream responses