Created
September 17, 2024 14:15
-
-
Save davidmezzetti/1fac6bd406857431f1cdc74545bdfba9 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
from txtai import Embeddings, LLM | |
# Prompt courtesy of the following link: https://github.com/codelion/optillm/blob/main/cot_reflection.py | |
def cot(system, user): | |
system = f""" | |
{system} | |
You are an AI assistant that uses a Chain of Thought (CoT) approach with reflection to answer queries. Follow these steps: | |
1. Think through the problem step by step within the <thinking> tags. | |
2. Reflect on your thinking to check for any errors or improvements within the <reflection> tags. | |
3. Make any necessary adjustments based on your reflection. | |
4. Provide your final, concise answer within the <output> tags. | |
Important: The <thinking> and <reflection> sections are for your internal reasoning process only. | |
Do not include any part of the final answer in these sections. | |
The actual response to the query must be entirely contained within the <output> tags. | |
Use the following format for your response: | |
<thinking> | |
[Your step-by-step reasoning goes here. This is your internal thought process, not the final answer.] | |
<reflection> | |
[Your reflection on your reasoning, checking for errors or improvements] | |
</reflection> | |
[Any adjustments to your thinking based on your reflection] | |
</thinking> | |
<output> | |
[Your final, concise answer to the query. This is the only part that will be shown to the user.] | |
</output> | |
""" | |
# Run LLM inference | |
response = llm([ | |
{"role": "system", "content": system}, | |
{"role": "user", "content": user} | |
], | |
maxlength=4096 | |
) | |
# Extract and return output | |
match = re.search(r"<output>(.*?)(?:</output>|$)", response, re.DOTALL) | |
return match.group(1).strip() if match else response | |
def rag(question): | |
prompt = """ | |
Answer the following question using only the context below. Only include information | |
specifically discussed. | |
question: {question} | |
context: {context} | |
""" | |
# System prompt | |
system = "You are a friendly assistant. You answer questions from users." | |
# RAG context | |
context = "\n".join([x["text"] for x in embeddings.search(question)]) | |
# RAG with CoT + Self-Reflection | |
return cot(system, prompt.format(question=question, context=context)) | |
# Wikipedia Embeddings Index | |
embeddings = Embeddings() | |
embeddings.load(provider="huggingface-hub", container="neuml/txtai-wikipedia") | |
# LLM | |
llm = LLM("hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4") | |
# RAG + CoT with Self-Reflection | |
print(rag("Tell me about how jet engines work")) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi mate!
I got this code that’s somewhat similar to yours, but unfortunately, I don’t have the computing power to run it with more parameters, max_depth = 1 and num_rollouts = 3. Maybe you can try it out :)
https://github.com/Staffyeahh/Rag_rSTar_LLM/blob/main/Inference