Off the manual trigger, add this http request node.
This node deletes the collection so you have a clean collection to work from when ingesting documents from end to end. Avoids duplicates etc.
- Edit the request and name it “Delete Collection”
- Method = DELETE
- URL = http://workshoplocal:6333/collections/[{{](http://workshoplocal:6333/collections/%7B%7B) $workflow.id }}
Off the Delete Collection node, add another HTTP request.
This downloads our workshop files that we will be using to process into the vector database and chat with later on. the zip file contains all of the files we will be working with.
- Edit the request and name it “Download Files”
- Method = GET
- URL = https://pub-d7ee5af244994655b124a46e14d6f0b1.r2.dev/pg.zip
Off the step 3 node, add a compression node.
This step allows us to decompress and access each of the files in the zip folder.
- Operation = Decompress
- Input binary field(s) = data
- Output prefix = file_
Off the compress node add a code node. This reads out the extracted files and allows for us to read these into a json format which is easier for the upstream nodes to work with.
- Mode = Run Once for All Items
- Language = Javascript
- Code = Below 👇
let results = [];
for (item of items) {
for (key of Object.keys(item.binary)) {
results.push({
json: {
fileName: item.binary[key].fileName
},
binary: {
data: item.binary[key],
}
});
}
}
return results;
Off the code node, add a "limit” node, this allows us to test workflows easier without needing to process all of the files every time we run and test the workflow. We can set the limit to a large number (or remove this node) when we want to process everything
- Max Items = 5
- Keep = First Items
Adding a loop node off the limit node allows us to loop over all items, and also set a batch size.
- Batch size = 2
Delete the replace me object
Off the loop, select the + ( add )
Then add the Qdrant node
Then on the operation type select Add documents to vector store
- Account (add one)
- Qdrant URL = http://workshoplocal:6333
- Operation Mode = Insert Documents
- Qdrant Collection = By ID
- Expression type
- Value = {{ $workflow.id }}
Adding a Qdrant account - this will be re-used for the chat portion of your application
Finalised Settings
Select the + icon under “embeddir”
The side panel will slide out, select Ollama Embeddings. This will allow us to use the open source models running on a GPU server. However, you could leverage any of the embedding providers here if you have API keys and accounts setup with them.
Settings for Embeddings Ollama
- Ollama credential
- Base URL = https://ollama.youngs.cloud
- Model = mxbai-embed-large:latest
Add ollama credential, this will also be re-used in the chat part of the document
Embeddings node once complete
Now we need to add a data loader, select the + below document on your Qdrant node. Select Default Data loader when the side panel swings out.
- Type of data = Binary
- Mode = Load Specific Data
- Data Format = Automatically detect by Mime Type
- Input Field Name = data
Off the data loader node, we need to add the text splitter. This tells the data loaded how big the “chunks” are. This is a useful node to experiment with chunk size and overlap
Click the + icon below Text Splitter followed by Token Splitter from the side panel.
- Chunk Size = 450
- Chunk Overlap = 50
⚠️ These chunk sizes roughly map to what the embedding model recomments (no more than 500 tokens to avoid truncation.Chunk size and overlap can be adjusted to meet your needs however and is model dependant, keep it at the above for the workshop and you can experiment once you are up and running, read more here
Completed node settings
Not strictly necessary, but let’s add some breathing room at the end of each loop/batch of documents that are processed.
Select the + out to the right of the vector store, we want to add a “wait” node
Add the wait node
- Resume = After time Interval
- Wait Amount = 2.50
- Wait Unit = Seconds
Step 13 - “closing the loop” / Workflow
We need to add an action for the “done” portion of the loop
Select the + next to Done on your Loop Over items, and add the “do nothing” node.
Your loop should look like this.
That completes all of the nodes in the ingest workflow, below is what your workflow should look like
Select the Test Workflow button at the bottom of your window - you will then visually see the workflow execute. If there are errors you will see the error appear on the node that failed.
If you want to see and of the input/outputs from any of the nodes, you can double click any of them after execution to see , as an example, here is the output from the token splitter. Large / full document input split out into 7 different chunks by the spliter.
Another example would be the limit, you can see 212 source documents in, and 5 output on the right
Now we want to build the chatbot that uses all of your embeddings and text output from your first flow (the ingest flow).
Off this chat trigger node, add a AI Agent Node
- Agent = Tools Agent
- Source for Prompt (user message) = Define Below
- Prompt (User Message) = {{ $json.chatInput }}
- Options = add a system, see below
Adding a system message option to the AI Agent Node
For the system message use the below prompt -
💡 This is also something you can experiment with. such as “always respond like a pirate” to make your chat responses from the Large Language Model tailored to your needs.
You are an intelligent assistant specialized in answering user questions using external. Your primary goal is to provide precise, contextually relevant, and concise answers based on the tools and resources available.
### TOOL
Use the "article_tool" tool to:
- perform semantic similarity searches and retrieve information from paul graham essays to the user's query.
- access detailed information about paul graham essays when additional context or specifics are required.
The AI Agent node once configured
This step allows us to bring our large language model which is reponsible for answering our questions and where will perform the “RAG”, by passing our user prompt (our question) and the context (your related chunks from the vector database)
Click the + below your AI Agent where it says chat model
Followed by the OpenAI Chat Model - similar to the embeddings model, you can experiment with other chat models or providers like OpenAI, Anthropic here if you want to when you have an account and API keys for these services. Also, because this is an AI agent that utilisese “tools” you will need to ensure you use a model that supports tool use.
⚠️ We are using the “OpenAI” chat model here because of a bug in this version of n8n. We will actually configure this node as an Ollama chat model. Follow the guide below.
You will need to add a new OpenAI credential, note that we will be “faking this” and directing this OpenAI node at our Ollama instance. Add OpenAI credential with the following settings.
- API Key = 1234567890 (or something random just to fill the field)
- Organisation ID = empty
- Base URL = https://ollama.youngs.cloud/v1
You can then test this credential ensuring you get the green connection tested successfully box.
Completed “OpenAI” Credential
After adding the credential detail save. Then select llama3.2:latest
- Credential to connect with = New “OpenAI Credential” created above
- Model = llama3.2:latest
- Options
- 💡 Fun to experiment with here would be the model “temperature” scale of 0 to 1, 0 being NO creative license, good for very factual data. 1 being loads of creative freedom, hallucinations may be higher here but if you are trying to do a more creative application then temperature can be useful to configure.
We need to add some memory to the AI Agent system, you can use the window buffer memory which is non-persistent. Think of this as the AI agent’s ability to recall recent conversations.
Under the Memory text, select the + followed by the Window Buffer Memory type
- Session ID = Connect Chat Trigger Node
- Session Key From Previous Node = {{ $json.sessionId }}
- Context Window Length = 40
Completed node settings
This allows the AI Agent to on demand (when it feels like it needs to) select this tool to find data in the vector store, this will look up your chunked documents based on vector similarity.
Under Tool, select the + followed by Qdrant Vector Store
The credential should default to the one you created during the ingest flow if not you can add one with a blank API key and the URL = http://workshoplocal:6333
- Credential to connect with = Qdrantapi Account (existing) or add new
- Operation Mode = Retreive Documents (As Tool for AI Agent)
- Name = article_tool
- Description = retreive information about paul graham essays
- Qdrant Collection = By ID,
- expression type
- value = {{ $workflow.id }}
- Limit = 6 (you can play around with this, but be careful about our LLM context window)
Completed node settings
We need to tell the Qdrant node what embedding model to use, when the user query comes in we “embed” this query the same way we embedded your chunks from the documents. This allows us to find similar chunks in the database to that of the user query.
Select the + under the Qdrant node and select Ollama Embeddings
- Credential = existing Ollama account (or add one with the settings described in this document)
- Model = mxbai-embed-large:latest (make sure this matches your ingest embedding model)
On the right hand side of our agent, select the +
Add the “edit fields” node type
Add a field and use the following settings
- Name = output
- Value = {{ $json.output }}
When you have all of the nodes complete this is what your chat flow will look like.
Now you can bring all of your hard work together and chat with your documents using your RAG system!
Select open chat from the bottom of your workflow
You can then ask questions and get an answer back and see the tools being used. As with other workflow runs you can view each node execution and the input/output data.
Example: “How does the document suggest cultivating originality and generating new ideas?”
🧪 Time to experiment
- Modify your system prompt, have some fun
- Change the chat model temperature
- Try return more (or less) results from the vector store, be careful of context windows with the small llama3.2 model - we dont want to feed too much in remember!
- If you have an OpenAI or similar account, try add this as your chat model, these are much more powerful than the llama3.2 model we are using here, you will get better answers, also the context windows are larger