Created
April 21, 2025 16:17
-
-
Save mcvaha/4746eee943a40fb22fe48753f9a802e5 to your computer and use it in GitHub Desktop.
Docs from Agents HF Course
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
└── units | |
└── en | |
├── _toctree.yml | |
├── bonus-unit1 | |
├── conclusion.mdx | |
├── fine-tuning.mdx | |
├── introduction.mdx | |
└── what-is-function-calling.mdx | |
├── bonus-unit2 | |
├── introduction.mdx | |
├── monitoring-and-evaluating-agents-notebook.mdx | |
├── quiz.mdx | |
└── what-is-agent-observability-and-evaluation.mdx | |
├── communication | |
├── live1.mdx | |
└── next-units.mdx | |
├── unit0 | |
├── discord101.mdx | |
├── introduction.mdx | |
└── onboarding.mdx | |
├── unit1 | |
├── README.md | |
├── actions.mdx | |
├── agent-steps-and-structure.mdx | |
├── conclusion.mdx | |
├── dummy-agent-library.mdx | |
├── final-quiz.mdx | |
├── introduction.mdx | |
├── messages-and-special-tokens.mdx | |
├── observations.mdx | |
├── quiz1.mdx | |
├── quiz2.mdx | |
├── thoughts.mdx | |
├── tools.mdx | |
├── tutorial.mdx | |
├── what-are-agents.mdx | |
└── what-are-llms.mdx | |
├── unit2 | |
├── introduction.mdx | |
├── langgraph | |
│ ├── building_blocks.mdx | |
│ ├── conclusion.mdx | |
│ ├── document_analysis_agent.mdx | |
│ ├── first_graph.mdx | |
│ ├── introduction.mdx | |
│ ├── quiz1.mdx | |
│ └── when_to_use_langgraph.mdx | |
├── llama-index | |
│ ├── README.md | |
│ ├── agents.mdx | |
│ ├── components.mdx | |
│ ├── conclusion.mdx | |
│ ├── introduction.mdx | |
│ ├── llama-hub.mdx | |
│ ├── quiz1.mdx | |
│ ├── quiz2.mdx | |
│ ├── tools.mdx | |
│ └── workflows.mdx | |
└── smolagents | |
│ ├── code_agents.mdx | |
│ ├── conclusion.mdx | |
│ ├── final_quiz.mdx | |
│ ├── introduction.mdx | |
│ ├── multi_agent_systems.mdx | |
│ ├── quiz1.mdx | |
│ ├── quiz2.mdx | |
│ ├── retrieval_agents.mdx | |
│ ├── tool_calling_agents.mdx | |
│ ├── tools.mdx | |
│ ├── vision_agents.mdx | |
│ └── why_use_smolagents.mdx | |
├── unit3 | |
├── README.md | |
└── agentic-rag | |
│ ├── agent.mdx | |
│ ├── agentic-rag.mdx | |
│ ├── conclusion.mdx | |
│ ├── introduction.mdx | |
│ ├── invitees.mdx | |
│ └── tools.mdx | |
└── unit4 | |
└── README.md | |
/units/en/_toctree.yml: | |
-------------------------------------------------------------------------------- | |
1 | - title: Unit 0. Welcome to the course | |
2 | sections: | |
3 | - local: unit0/introduction | |
4 | title: Welcome to the course 🤗 | |
5 | - local: unit0/onboarding | |
6 | title: Onboarding | |
7 | - local: unit0/discord101 | |
8 | title: (Optional) Discord 101 | |
9 | - title: Live 1. How the course works and Q&A | |
10 | sections: | |
11 | - local: communication/live1 | |
12 | title: Live 1. How the course works and Q&A | |
13 | - title: Unit 1. Introduction to Agents | |
14 | sections: | |
15 | - local: unit1/introduction | |
16 | title: Introduction | |
17 | - local: unit1/what-are-agents | |
18 | title: What is an Agent? | |
19 | - local: unit1/quiz1 | |
20 | title: Quick Quiz 1 | |
21 | - local: unit1/what-are-llms | |
22 | title: What are LLMs? | |
23 | - local: unit1/messages-and-special-tokens | |
24 | title: Messages and Special Tokens | |
25 | - local: unit1/tools | |
26 | title: What are Tools? | |
27 | - local: unit1/quiz2 | |
28 | title: Quick Quiz 2 | |
29 | - local: unit1/agent-steps-and-structure | |
30 | title: Understanding AI Agents through the Thought-Action-Observation Cycle | |
31 | - local: unit1/thoughts | |
32 | title: Thought, Internal Reasoning and the Re-Act Approach | |
33 | - local: unit1/actions | |
34 | title: Actions, Enabling the Agent to Engage with Its Environment | |
35 | - local: unit1/observations | |
36 | title: Observe, Integrating Feedback to Reflect and Adapt | |
37 | - local: unit1/dummy-agent-library | |
38 | title: Dummy Agent Library | |
39 | - local: unit1/tutorial | |
40 | title: Let’s Create Our First Agent Using smolagents | |
41 | - local: unit1/final-quiz | |
42 | title: Unit 1 Final Quiz | |
43 | - local: unit1/conclusion | |
44 | title: Conclusion | |
45 | - title: Unit 2. Frameworks for AI Agents | |
46 | sections: | |
47 | - local: unit2/introduction | |
48 | title: Frameworks for AI Agents | |
49 | - title: Unit 2.1 The smolagents framework | |
50 | sections: | |
51 | - local: unit2/smolagents/introduction | |
52 | title: Introduction to smolagents | |
53 | - local: unit2/smolagents/why_use_smolagents | |
54 | title: Why use smolagents? | |
55 | - local: unit2/smolagents/quiz1 | |
56 | title: Quick Quiz 1 | |
57 | - local: unit2/smolagents/code_agents | |
58 | title: Building Agents That Use Code | |
59 | - local: unit2/smolagents/tool_calling_agents | |
60 | title: Writing actions as code snippets or JSON blobs | |
61 | - local: unit2/smolagents/tools | |
62 | title: Tools | |
63 | - local: unit2/smolagents/retrieval_agents | |
64 | title: Retrieval Agents | |
65 | - local: unit2/smolagents/quiz2 | |
66 | title: Quick Quiz 2 | |
67 | - local: unit2/smolagents/multi_agent_systems | |
68 | title: Multi-Agent Systems | |
69 | - local: unit2/smolagents/vision_agents | |
70 | title: Vision and Browser agents | |
71 | - local: unit2/smolagents/final_quiz | |
72 | title: Final Quiz | |
73 | - local: unit2/smolagents/conclusion | |
74 | title: Conclusion | |
75 | - title: Unit 2.2 The LlamaIndex framework | |
76 | sections: | |
77 | - local: unit2/llama-index/introduction | |
78 | title: Introduction to LLamaIndex | |
79 | - local: unit2/llama-index/llama-hub | |
80 | title: Introduction to LlamaHub | |
81 | - local: unit2/llama-index/components | |
82 | title: What are Components in LlamaIndex? | |
83 | - local: unit2/llama-index/tools | |
84 | title: Using Tools in LlamaIndex | |
85 | - local: unit2/llama-index/quiz1 | |
86 | title: Quick Quiz 1 | |
87 | - local: unit2/llama-index/agents | |
88 | title: Using Agents in LlamaIndex | |
89 | - local: unit2/llama-index/workflows | |
90 | title: Creating Agentic Workflows in LlamaIndex | |
91 | - local: unit2/llama-index/quiz2 | |
92 | title: Quick Quiz 2 | |
93 | - local: unit2/llama-index/conclusion | |
94 | title: Conclusion | |
95 | - title: Unit 2.3 The LangGraph framework | |
96 | sections: | |
97 | - local: unit2/langgraph/introduction | |
98 | title: Introduction to LangGraph | |
99 | - local: unit2/langgraph/when_to_use_langgraph | |
100 | title: What is LangGraph? | |
101 | - local: unit2/langgraph/building_blocks | |
102 | title: Building Blocks of LangGraph | |
103 | - local: unit2/langgraph/first_graph | |
104 | title: Building Your First LangGraph | |
105 | - local: unit2/langgraph/document_analysis_agent | |
106 | title: Document Analysis Graph | |
107 | - local: unit2/langgraph/quiz1 | |
108 | title: Quick Quiz 1 | |
109 | - local: unit2/langgraph/conclusion | |
110 | title: Conclusion | |
111 | - title: Unit 3. Use Case for Agentic RAG | |
112 | sections: | |
113 | - local: unit3/agentic-rag/introduction | |
114 | title: Introduction to Use Case for Agentic RAG | |
115 | - local: unit3/agentic-rag/agentic-rag | |
116 | title: Agentic Retrieval Augmented Generation (RAG) | |
117 | - local: unit3/agentic-rag/invitees | |
118 | title: Creating a RAG Tool for Guest Stories | |
119 | - local: unit3/agentic-rag/tools | |
120 | title: Building and Integrating Tools for Your Agent | |
121 | - local: unit3/agentic-rag/agent | |
122 | title: Creating Your Gala Agent | |
123 | - local: unit3/agentic-rag/conclusion | |
124 | title: Conclusion | |
125 | - title: Bonus Unit 1. Fine-tuning an LLM for Function-calling | |
126 | sections: | |
127 | - local: bonus-unit1/introduction | |
128 | title: Introduction | |
129 | - local: bonus-unit1/what-is-function-calling | |
130 | title: What is Function Calling? | |
131 | - local: bonus-unit1/fine-tuning | |
132 | title: Let's Fine-Tune your model for Function-calling | |
133 | - local: bonus-unit1/conclusion | |
134 | title: Conclusion | |
135 | - title: Bonus Unit 2. Agent Observability and Evaluation | |
136 | sections: | |
137 | - local: bonus-unit2/introduction | |
138 | title: Introduction | |
139 | - local: bonus-unit2/what-is-agent-observability-and-evaluation | |
140 | title: What is agent observability and evaluation? | |
141 | - local: bonus-unit2/monitoring-and-evaluating-agents-notebook | |
142 | title: Monitoring and evaluating agents | |
143 | - local: bonus-unit2/quiz | |
144 | title: Quiz | |
145 | - title: When the next steps are published? | |
146 | sections: | |
147 | - local: communication/next-units | |
148 | title: Next Units | |
149 | | |
150 | | |
-------------------------------------------------------------------------------- | |
/units/en/bonus-unit1/conclusion.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Conclusion [[conclusion]] | |
2 | | |
3 | Congratulations on finishing this first Bonus Unit 🥳 | |
4 | | |
5 | You've just **mastered understanding function-calling and how to fine-tune your model to do function-calling**! | |
6 | | |
7 | If we have one piece of advice now, it’s to try to **fine-tune different models**. The **best way to learn is by trying.** | |
8 | | |
9 | In the next Unit, you're going to learn how to use **state-of-the-art frameworks such as `smolagents`, `LlamaIndex` and `LangGraph`**. | |
10 | | |
11 | Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog) | |
12 | | |
13 | ### Keep Learning, Stay Awesome 🤗 | |
-------------------------------------------------------------------------------- | |
/units/en/bonus-unit1/fine-tuning.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Let's Fine-Tune Your Model for Function-Calling | |
2 | | |
3 | We're now ready to fine-tune our first model for function-calling 🔥. | |
4 | | |
5 | ## How do we train our model for function-calling? | |
6 | | |
7 | > Answer: We need **data** | |
8 | | |
9 | A model training process can be divided into 3 steps: | |
10 | | |
11 | 1. **The model is pre-trained on a large quantity of data**. The output of that step is a **pre-trained model**. For instance, [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b). It's a base model and only knows how **to predict the next token without strong instruction following capabilities**. | |
12 | | |
13 | 2. To be useful in a chat context, the model then needs to be **fine-tuned** to follow instructions. In this step, it can be trained by model creators, the open-source community, you, or anyone. For instance, [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) is an instruction-tuned model by the Google Team behind the Gemma project. | |
14 | | |
15 | 3. The model can then be **aligned** to the creator's preferences. For instance, a customer service chat model that must never be impolite to customers. | |
16 | | |
17 | Usually a complete product like Gemini or Mistral **will go through all 3 steps**, whereas the models you can find on Hugging Face have completed one or more steps of this training. | |
18 | | |
19 | In this tutorial, we will build a function-calling model based on [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it). We choose the fine-tuned model [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) instead of the base model [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) because the fine-tuned model has been improved for our use-case. | |
20 | | |
21 | Starting from the pre-trained model **would require more training in order to learn instruction following, chat AND function-calling**. | |
22 | | |
23 | By starting from the instruction-tuned model, **we minimize the amount of information that our model needs to learn**. | |
24 | | |
25 | ## LoRA (Low-Rank Adaptation of Large Language Models) | |
26 | | |
27 | LoRA is a popular and lightweight training technique that significantly **reduces the number of trainable parameters**. | |
28 | | |
29 | It works by **inserting a smaller number of new weights as an adapter into the model to train**. This makes training with LoRA much faster, memory-efficient, and produces smaller model weights (a few hundred MBs), which are easier to store and share. | |
30 | | |
31 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/blog_multi-lora-serving_LoRA.gif" alt="LoRA inference" width="50%"/> | |
32 | | |
33 | LoRA works by adding pairs of rank decomposition matrices to Transformer layers, typically focusing on linear layers. During training, we will "freeze" the rest of the model and will only update the weights of those newly added adapters. | |
34 | | |
35 | By doing so, the number of **parameters** that we need to train drops considerably as we only need to update the adapter's weights. | |
36 | | |
37 | During inference, the input is passed into the adapter and the base model, or these adapter weights can be merged with the base model, resulting in no additional latency overhead. | |
38 | | |
39 | LoRA is particularly useful for adapting **large** language models to specific tasks or domains while keeping resource requirements manageable. This helps reduce the memory **required** to train a model. | |
40 | | |
41 | If you want to learn more about how LoRA works, you should check out this [tutorial](https://huggingface.co/learn/nlp-course/chapter11/4?fw=pt). | |
42 | | |
43 | ## Fine-Tuning a Model for Function-Calling | |
44 | | |
45 | You can access the tutorial notebook 👉 [here](https://huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb). | |
46 | | |
47 | Then, click on [](https://colab.research.google.com/#fileId=https://huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb) to be able to run it in a Colab Notebook. | |
48 | | |
49 | | |
50 | | |
-------------------------------------------------------------------------------- | |
/units/en/bonus-unit1/introduction.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Introduction | |
2 | | |
3 |  | |
4 | | |
5 | Welcome to this first **Bonus Unit**, where you'll learn to **fine-tune a Large Language Model (LLM) for function calling**. | |
6 | | |
7 | In terms of LLMs, function calling is quickly becoming a *must-know* technique. | |
8 | | |
9 | The idea is, rather than relying only on prompt-based approaches like we did in Unit 1, function calling trains your model to **take actions and interpret observations during the training phase**, making your AI more robust. | |
10 | | |
11 | > **When should I do this Bonus Unit?** | |
12 | > | |
13 | > This section is **optional** and is more advanced than Unit 1, so don't hesitate to either do this unit now or revisit it when your knowledge has improved thanks to this course. | |
14 | > | |
15 | > But don't worry, this Bonus Unit is designed to have all the information you need, so we'll walk you through every core concept of fine-tuning a model for function-calling even if you haven’t learned yet the inner workings of fine-tuning. | |
16 | | |
17 | The best way for you to be able to follow this Bonus Unit is: | |
18 | | |
19 | 1. Know how to Fine-Tune an LLM with Transformers, if it's not the case [check this](https://huggingface.co/learn/nlp-course/chapter3/1?fw=pt). | |
20 | | |
21 | 2. Know how to use `SFTTrainer` to fine-tune our model, to learn more about it [check this documentation](https://huggingface.co/learn/nlp-course/en/chapter11/1). | |
22 | | |
23 | --- | |
24 | | |
25 | ## What You’ll Learn | |
26 | | |
27 | 1. **Function Calling** | |
28 | How modern LLMs structure their conversations effectively letting them trigger **Tools**. | |
29 | | |
30 | 2. **LoRA (Low-Rank Adaptation)** | |
31 | A **lightweight and efficient** fine-tuning method that cuts down on computational and storage overhead. LoRA makes training large models *faster, cheaper, and easier* to deploy. | |
32 | | |
33 | 3. **The Thought → Act → Observe Cycle** in Function Calling models | |
34 | A simple but powerful approach for structuring how your model decides when (and how) to call functions, track intermediate steps, and interpret the results from external Tools or APIs. | |
35 | | |
36 | 4. **New Special Tokens** | |
37 | We’ll introduce **special markers** that help the model distinguish between: | |
38 | - Internal “chain-of-thought” reasoning | |
39 | - Outgoing function calls | |
40 | - Responses coming back from external tools | |
41 | | |
42 | --- | |
43 | | |
44 | By the end of this bonus unit, you’ll be able to: | |
45 | | |
46 | - **Understand** the inner working of APIs when it comes to Tools. | |
47 | - **Fine-tune** a model using the LoRA technique. | |
48 | - **Implement** and **modify** the Thought → Act → Observe cycle to create robust and maintainable Function-calling workflows. | |
49 | - **Design and utilize** special tokens to seamlessly separate the model’s internal reasoning from its external actions. | |
50 | | |
51 | And you'll **have fine-tuned your own model to do function calling.** 🔥 | |
52 | | |
53 | Let’s dive into **function calling**! | |
54 | | |
-------------------------------------------------------------------------------- | |
/units/en/bonus-unit1/what-is-function-calling.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # What is Function Calling? | |
2 | | |
3 | Function-calling is a **way for an LLM to take actions on its environment**. It was first [introduced in GPT-4](https://openai.com/index/function-calling-and-other-api-updates/), and was later reproduced in other models. | |
4 | | |
5 | Just like the tools of an Agent, function-calling gives the model the capacity to **take an action on its environment**. However, the function calling capacity **is learned by the model**, and relies **less on prompting than other agents techniques**. | |
6 | | |
7 | During Unit 1, the Agent **didn't learn to use the Tools**, we just provided the list, and we relied on the fact that the model **was able to generalize on defining a plan using these Tools**. | |
8 | | |
9 | While here, **with function-calling, the Agent is fine-tuned (trained) to use Tools**. | |
10 | | |
11 | ## How does the model "learn" to take an action? | |
12 | | |
13 | In Unit 1, we explored the general workflow of an agent. Once the user has given some tools to the agent and prompted it with a query, the model will cycle through: | |
14 | | |
15 | 1. *Think* : What action(s) do I need to take in order to fulfill the objective. | |
16 | 2. *Act* : Format the action with the correct parameter and stop the generation. | |
17 | 3. *Observe* : Get back the result from the execution. | |
18 | | |
19 | In a "typical" conversation with a model through an API, the conversation will alternate between user and assistant messages like this: | |
20 | | |
21 | ```python | |
22 | conversation = [ | |
23 | {"role": "user", "content": "I need help with my order"}, | |
24 | {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"}, | |
25 | {"role": "user", "content": "It's ORDER-123"}, | |
26 | ] | |
27 | ``` | |
28 | | |
29 | Function-calling brings **new roles to the conversation**! | |
30 | | |
31 | 1. One new role for an **Action** | |
32 | 2. One new role for an **Observation** | |
33 | | |
34 | If we take the [Mistral API](https://docs.mistral.ai/capabilities/function_calling/) as an example, it would look like this: | |
35 | | |
36 | ```python | |
37 | conversation = [ | |
38 | { | |
39 | "role": "user", | |
40 | "content": "What's the status of my transaction T1001?" | |
41 | }, | |
42 | { | |
43 | "role": "assistant", | |
44 | "content": "", | |
45 | "function_call": { | |
46 | "name": "retrieve_payment_status", | |
47 | "arguments": "{\"transaction_id\": \"T1001\"}" | |
48 | } | |
49 | }, | |
50 | { | |
51 | "role": "tool", | |
52 | "name": "retrieve_payment_status", | |
53 | "content": "{\"status\": \"Paid\"}" | |
54 | }, | |
55 | { | |
56 | "role": "assistant", | |
57 | "content": "Your transaction T1001 has been successfully paid." | |
58 | } | |
59 | ] | |
60 | ``` | |
61 | | |
62 | > ... But you said there's a new role for function calls? | |
63 | | |
64 | **Yes and no**, in this case and in a lot of other APIs, the model formats the action to take as an "assistant" message. The chat template will then represent this as **special tokens** for function-calling. | |
65 | | |
66 | - `[AVAILABLE_TOOLS]` – Start the list of available tools | |
67 | - `[/AVAILABLE_TOOLS]` – End the list of available tools | |
68 | - `[TOOL_CALLS]` – Make a call to a tool (i.e., take an "Action") | |
69 | - `[TOOL_RESULTS]` – "Observe" the result of the action | |
70 | - `[/TOOL_RESULTS]` – End of the observation (i.e., the model can decode again) | |
71 | | |
72 | We'll talk again about function-calling in this course, but if you want to dive deeper you can check [this excellent documentation section](https://docs.mistral.ai/capabilities/function_calling/). | |
73 | | |
74 | --- | |
75 | Now that we learned what function-calling is and how it works, let's **add some function-calling capabilities to a model that does not have those capacities yet**: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it), by appending some new special tokens to the model. | |
76 | | |
77 | To be able to do that, **we need first to understand fine-tuning and LoRA**. | |
78 | | |
-------------------------------------------------------------------------------- | |
/units/en/bonus-unit2/introduction.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # AI Agent Observability & Evaluation | |
2 | | |
3 |  | |
4 | | |
5 | Welcome to **Bonus Unit 2**! In this chapter, you'll explore advanced strategies for observing, evaluating, and ultimately improving the performance of your agents. | |
6 | | |
7 | --- | |
8 | | |
9 | ## 📚 When Should I Do This Bonus Unit? | |
10 | | |
11 | This bonus unit is perfect if you: | |
12 | - **Develop and Deploy AI Agents:** You want to ensure that your agents are performing reliably in production. | |
13 | - **Need Detailed Insights:** You're looking to diagnose issues, optimize performance, or understand the inner workings of your agent. | |
14 | - **Aim to Reduce Operational Overhead:** By monitoring agent costs, latency, and execution details, you can efficiently manage resources. | |
15 | - **Seek Continuous Improvement:** You’re interested in integrating both real-time user feedback and automated evaluation into your AI applications. | |
16 | | |
17 | In short, for everyone who wants to bring their agents in front of users! | |
18 | | |
19 | --- | |
20 | | |
21 | ## 🤓 What You’ll Learn | |
22 | | |
23 | In this unit, you'll learn: | |
24 | - **Instrument Your Agent:** Learn how to integrate observability tools via OpenTelemetry with the *smolagents* framework. | |
25 | - **Monitor Metrics:** Track performance indicators such as token usage (costs), latency, and error traces. | |
26 | - **Evaluate in Real-Time:** Understand techniques for live evaluation, including gathering user feedback and leveraging an LLM-as-a-judge. | |
27 | - **Offline Analysis:** Use benchmark datasets (e.g., GSM8K) to test and compare agent performance. | |
28 | | |
29 | --- | |
30 | | |
31 | ## 🚀 Ready to Get Started? | |
32 | | |
33 | In the next section, you'll learn the basics of Agent Observability and Evaluation. After that, its time to see it in action! | |
-------------------------------------------------------------------------------- | |
/units/en/bonus-unit2/quiz.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Quiz: Evaluating AI Agents | |
2 | | |
3 | Let's assess your understanding of the agent tracing and evaluation concepts covered in this bonus unit. | |
4 | | |
5 | This quiz is optional and ungraded. | |
6 | | |
7 | ### Q1: What does observability in AI agents primarily refer to? | |
8 | Which statement accurately describes the purpose of observability for AI agents? | |
9 | | |
10 | <Question | |
11 | choices={[ | |
12 | { | |
13 | text: "It involves tracking internal operations through logs, metrics, and spans to understand agent behavior.", | |
14 | explain: "Correct! Observability means using logs, metrics, and spans to shed light on the inner workings of the agent.", | |
15 | correct: true | |
16 | }, | |
17 | { | |
18 | text: "It is solely focused on reducing the financial cost of running the agent.", | |
19 | explain: "Observability covers cost but is not limited to it." | |
20 | }, | |
21 | { | |
22 | text: "It refers only to the external appearance and UI of the agent.", | |
23 | explain: "Observability is about the internal processes, not the UI." | |
24 | }, | |
25 | { | |
26 | text: "It is concerned with coding style and code aesthetics only.", | |
27 | explain: "Code style is unrelated to observability in this context." | |
28 | } | |
29 | ]} | |
30 | /> | |
31 | | |
32 | ### Q2: Which of the following is NOT a common metric monitored in agent observability? | |
33 | Select the metric that does not typically fall under the observability umbrella. | |
34 | | |
35 | <Question | |
36 | choices={[ | |
37 | { | |
38 | text: "Latency", | |
39 | explain: "Latency is commonly tracked to assess agent responsiveness." | |
40 | }, | |
41 | { | |
42 | text: "Cost per Agent Run", | |
43 | explain: "Monitoring cost is a key aspect of observability." | |
44 | }, | |
45 | { | |
46 | text: "User Feedback and Ratings", | |
47 | explain: "User feedback is crucial for evaluating agent performance." | |
48 | }, | |
49 | { | |
50 | text: "Lines of Code of the Agent", | |
51 | explain: "The number of lines of code is not a typical observability metric.", | |
52 | correct: true | |
53 | } | |
54 | ]} | |
55 | /> | |
56 | | |
57 | ### Q3: What best describes offline evaluation of an AI agent? | |
58 | Determine the statement that correctly captures the essence of offline evaluation. | |
59 | | |
60 | <Question | |
61 | choices={[ | |
62 | { | |
63 | text: "Evaluating the agent using real user interactions in a live environment.", | |
64 | explain: "This describes online evaluation rather than offline." | |
65 | }, | |
66 | { | |
67 | text: "Assessing agent performance using curated datasets with known ground truth.", | |
68 | explain: "Correct! Offline evaluation uses test datasets to gauge performance against known answers.", | |
69 | correct: true | |
70 | }, | |
71 | { | |
72 | text: "Monitoring the agent's internal logs in real-time.", | |
73 | explain: "This is more related to observability rather than evaluation." | |
74 | }, | |
75 | { | |
76 | text: "Running the agent without any evaluation metrics.", | |
77 | explain: "This approach does not provide meaningful insights." | |
78 | } | |
79 | ]} | |
80 | /> | |
81 | | |
82 | ### Q4: Which advantage does online evaluation of agents offer? | |
83 | Pick the statement that best reflects the benefit of online evaluation. | |
84 | | |
85 | <Question | |
86 | choices={[ | |
87 | { | |
88 | text: "It provides controlled testing scenarios using pre-defined datasets.", | |
89 | explain: "Controlled testing is a benefit of offline evaluation, not online." | |
90 | }, | |
91 | { | |
92 | text: "It captures live user interactions and real-world performance data.", | |
93 | explain: "Correct! Online evaluation offers insights by monitoring the agent in a live setting.", | |
94 | correct: true | |
95 | }, | |
96 | { | |
97 | text: "It eliminates the need for any offline testing and benchmarks.", | |
98 | explain: "Both offline and online evaluations are important and complementary." | |
99 | }, | |
100 | { | |
101 | text: "It solely focuses on reducing the computational cost of the agent.", | |
102 | explain: "Cost monitoring is part of observability, not the primary advantage of online evaluation." | |
103 | } | |
104 | ]} | |
105 | /> | |
106 | | |
107 | ### Q5: What role does OpenTelemetry play in AI agent observability and evaluation? | |
108 | Which statement best describes the role of OpenTelemetry in monitoring AI agents? | |
109 | | |
110 | <Question | |
111 | choices={[ | |
112 | { | |
113 | text: "It provides a standardized framework to instrument code, enabling the collection of traces, metrics, and logs for observability.", | |
114 | explain: "Correct! OpenTelemetry standardizes instrumentation for telemetry data, which is crucial for monitoring and diagnosing agent behavior.", | |
115 | correct: true | |
116 | }, | |
117 | { | |
118 | text: "It acts as a replacement for manual debugging by automatically fixing code issues.", | |
119 | explain: "Incorrect. OpenTelemetry is used for gathering telemetry data, not for debugging code issues." | |
120 | }, | |
121 | { | |
122 | text: "It primarily serves as a database for storing historical logs without real-time capabilities.", | |
123 | explain: "Incorrect. OpenTelemetry focuses on real-time telemetry data collection and exporting data to analysis tools." | |
124 | }, | |
125 | { | |
126 | text: "It is used to optimize the computational performance of the AI agent by automatically tuning model parameters.", | |
127 | explain: "Incorrect. OpenTelemetry is centered on observability rather than performance tuning." | |
128 | } | |
129 | ]} | |
130 | /> | |
131 | | |
132 | Congratulations on completing this quiz! 🎉 If you missed any questions, consider reviewing the content of this bonus unit for a deeper understanding. If you did well, you're ready to explore more advanced topics in agent observability and evaluation! | |
133 | | |
-------------------------------------------------------------------------------- | |
/units/en/bonus-unit2/what-is-agent-observability-and-evaluation.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # AI Agent Observability and Evaluation | |
2 | | |
3 | ## 🔎 What is Observability? | |
4 | | |
5 | Observability is about understanding what's happening inside your AI agent by looking at external signals like logs, metrics, and traces. For AI agents, this means tracking actions, tool usage, model calls, and responses to debug and improve agent performance. | |
6 | | |
7 |  | |
8 | | |
9 | ## 🔭 Why Agent Observability Matters | |
10 | | |
11 | Without observability, AI agents are "black boxes." Observability tools make agents transparent, enabling you to: | |
12 | | |
13 | - Understand costs and accuracy trade-offs | |
14 | - Measure latency | |
15 | - Detect harmful language & prompt injection | |
16 | - Monitor user feedback | |
17 | | |
18 | In other words, it makes your demo agent ready for production! | |
19 | | |
20 | ## 🔨 Observability Tools | |
21 | | |
22 | Common observability tools for AI agents include platforms like [Langfuse](https://langfuse.com) and [Arize](https://www.arize.com). These tools help collect detailed traces and offer dashboards to monitor metrics in real-time, making it easy to detect problems and optimize performance. | |
23 | | |
24 | Observability tools vary widely in their features and capabilities. Some tools are open source, benefiting from large communities that shape their roadmaps and extensive integrations. Additionally, certain tools specialize in specific aspects of LLMOps—such as observability, evaluations, or prompt management—while others are designed to cover the entire LLMOps workflow. We encourage you to explore the documentation of different options to pick a solution that works well for you. | |
25 | | |
26 | Many agent frameworks such as [smolagents](https://huggingface.co/docs/smolagents/v1.12.0/en/index) use the [OpenTelemetry](https://opentelemetry.io/docs/) standard to expose metadata to the observability tools. In addition to this, observability tools build custom instrumentations to allow for more flexibility in the fast moving world of LLMs. You should check the documentation of the tool you are using to see what is supported. | |
27 | | |
28 | ## 🔬Traces and Spans | |
29 | | |
30 | Observability tools usually represent agent runs as traces and spans. | |
31 | | |
32 | - **Traces** represent a complete agent task from start to finish (like handling a user query). | |
33 | - **Spans** are individual steps within the trace (like calling a language model or retrieving data). | |
34 | | |
35 |  | |
36 | | |
37 | ## 📊 Key Metrics to Monitor | |
38 | | |
39 | Here are some of the most common metrics that observability tools monitor: | |
40 | | |
41 | **Latency:** How quickly does the agent respond? Long waiting times negatively impact user experience. You should measure latency for tasks and individual steps by tracing agent runs. For example, an agent that takes 20 seconds for all model calls could be accelerated by using a faster model or by running model calls in parallel. | |
42 | | |
43 | **Costs:** What’s the expense per agent run? AI agents rely on LLM calls billed per token or external APIs. Frequent tool usage or multiple prompts can rapidly increase costs. For instance, if an agent calls an LLM five times for marginal quality improvement, you must assess if the cost is justified or if you could reduce the number of calls or use a cheaper model. Real-time monitoring can also help identify unexpected spikes (e.g., bugs causing excessive API loops). | |
44 | | |
45 | **Request Errors:** How many requests did the agent fail? This can include API errors or failed tool calls. To make your agent more robust against these in production, you can then set up fallbacks or retries. E.g. if LLM provider A is down, you switch to LLM provider B as backup. | |
46 | | |
47 | **User Feedback:** Implementing direct user evaluations provide valuable insights. This can include explicit ratings (👍thumbs-up/👎down, ⭐1-5 stars) or textual comments. Consistent negative feedback should alert you as this is a sign that the agent is not working as expected. | |
48 | | |
49 | **Implicit User Feedback:** User behaviors provide indirect feedback even without explicit ratings. This can include immediate question rephrasing, repeated queries or clicking a retry button. E.g. if you see that users repeatedly ask the same question, this is a sign that the agent is not working as expected. | |
50 | | |
51 | **Accuracy:** How frequently does the agent produce correct or desirable outputs? Accuracy definitions vary (e.g., problem-solving correctness, information retrieval accuracy, user satisfaction). The first step is to define what success looks like for your agent. You can track accuracy via automated checks, evaluation scores, or task completion labels. For example, marking traces as "succeeded" or "failed". | |
52 | | |
53 | **Automated Evaluation Metrics:** You can also set up automated evals. For instance, you can use an LLM to score the output of the agent e.g. if it is helpful, accurate, or not. There are also several open source libraries that help you to score different aspects of the agent. E.g. [RAGAS](https://docs.ragas.io/) for RAG agents or [LLM Guard](https://llm-guard.com/) to detect harmful language or prompt injection. | |
54 | | |
55 | In practice, a combination of these metrics gives the best coverage of an AI agent’s health. In this chapters [example notebook](https://huggingface.co/learn/agents-course/en/bonus-unit2/monitoring-and-evaluating-agents-notebook), we'll show you how these metrics looks in real examples but first, we'll learn how a typical evaluation workflow looks like. | |
56 | | |
57 | ## 👍 Evaluating AI Agents | |
58 | | |
59 | Observability gives us metrics, but evaluation is the process of analyzing that data (and performing tests) to determine how well an AI agent is performing and how it can be improved. In other words, once you have those traces and metrics, how do you use them to judge the agent and make decisions? | |
60 | | |
61 | Regular evaluation is important because AI agents are often non-deterministic and can evolve (through updates or drifting model behavior) – without evaluation, you wouldn’t know if your “smart agent” is actually doing its job well or if it’s regressed. | |
62 | | |
63 | There are two categories of evaluations for AI agents: **online evaluation** and **offline evaluation**. Both are valuable, and they complement each other. We usually begin with offline evaluation, as this is the minimum necessary step before deploying any agent. | |
64 | | |
65 | ### 🥷 Offline Evaluation | |
66 | | |
67 |  | |
68 | | |
69 | This involves evaluating the agent in a controlled setting, typically using test datasets, not live user queries. You use curated datasets where you know what the expected output or correct behavior is, and then run your agent on those. | |
70 | | |
71 | For instance, if you built a math word-problem agent, you might have a [test dataset](https://huggingface.co/datasets/gsm8k) of 100 problems with known answers. Offline evaluation is often done during development (and can be part of CI/CD pipelines) to check improvements or guard against regressions. The benefit is that it’s **repeatable and you can get clear accuracy metrics since you have ground truth**. You might also simulate user queries and measure the agent’s responses against ideal answers or use automated metrics as described above. | |
72 | | |
73 | The key challenge with offline eval is ensuring your test dataset is comprehensive and stays relevant – the agent might perform well on a fixed test set but encounter very different queries in production. Therefore, you should keep test sets updated with new edge cases and examples that reflect real-world scenarios. A mix of small “smoke test” cases and larger evaluation sets is useful: small sets for quick checks and larger ones for broader performance metrics. | |
74 | | |
75 | ### 🔄 Online Evaluation | |
76 | | |
77 | This refers to evaluating the agent in a live, real-world environment, i.e. during actual usage in production. Online evaluation involves monitoring the agent’s performance on real user interactions and analyzing outcomes continuously. | |
78 | | |
79 | For example, you might track success rates, user satisfaction scores, or other metrics on live traffic. The advantage of online evaluation is that it **captures things you might not anticipate in a lab setting** – you can observe model drift over time (if the agent’s effectiveness degrades as input patterns shift) and catch unexpected queries or situations that weren’t in your test data. It provides a true picture of how the agent behaves in the wild. | |
80 | | |
81 | Online evaluation often involves collecting implicit and explicit user feedback, as discussed, and possibly running shadow tests or A/B tests (where a new version of the agent runs in parallel to compare against the old). The challenge is that it can be tricky to get reliable labels or scores for live interactions – you might rely on user feedback or downstream metrics (like did the user click the result). | |
82 | | |
83 | ### 🤝 Combining the two | |
84 | | |
85 | In practice, successful AI agent evaluation blends **online** and **offline** methods. You might run regular offline benchmarks to quantitatively score your agent on defined tasks and continuously monitor live usage to catch things the benchmarks miss. For example, offline tests can catch if a code-generation agent’s success rate on a known set of problems is improving, while online monitoring might alert you that users have started asking a new category of question that the agent struggles with. Combining both gives a more robust picture. | |
86 | | |
87 | In fact, many teams adopt a loop: _offline evaluation → deploy new agent version → monitor online metrics and collect new failure examples → add those examples to offline test set → iterate_. This way, evaluation is continuous and ever-improving. | |
88 | | |
89 | ## 🧑💻 Lets see how this works in practice | |
90 | | |
91 | In the next section, we'll see examples of how we can use observability tools to monitor and evaluate our agent. | |
92 | | |
93 | | |
94 | | |
-------------------------------------------------------------------------------- | |
/units/en/communication/live1.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Live 1: How the Course Works and First Q&A | |
2 | | |
3 | In this first live stream of the Agents Course, we explained how the course **works** (scope, units, challenges, and more) and answered your questions. | |
4 | | |
5 | <iframe width="560" height="315" src="https://www.youtube.com/embed/iLVyYDbdSmM?si=TCX5Ai3uZuKLXq45" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> | |
6 | | |
7 | To know when the next live session is scheduled, check our **Discord server**. We will also send you an email. If you can’t participate, don’t worry, we **record all live sessions**. | |
8 | | |
-------------------------------------------------------------------------------- | |
/units/en/communication/next-units.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # When will the next units be published? | |
2 | | |
3 | Here's the publication schedule: | |
4 | | |
5 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/next-units.jpg" alt="Next Units" width="100%"/> | |
6 | | |
7 | Don't forget to <a href="https://bit.ly/hf-learn-agents">sign up for the course</a>! By signing up, **we can send you the links as each unit is published, along with updates and details about upcoming challenges**. | |
8 | | |
9 | Keep Learning, Stay Awesome 🤗 | |
-------------------------------------------------------------------------------- | |
/units/en/unit0/discord101.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # (Optional) Discord 101 [[discord-101]] | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/discord-etiquette.jpg" alt="The Discord Etiquette" width="100%"/> | |
4 | | |
5 | This guide is designed to help you get started with Discord, a free chat platform popular in the gaming and ML communities. | |
6 | | |
7 | Join the Hugging Face Community Discord server, which **has over 100,000 members**, by clicking <a href="https://discord.gg/UrrTSsSyjb" target="_blank">here</a>. It's a great place to connect with others! | |
8 | | |
9 | ## The Agents course on Hugging Face's Discord Community | |
10 | | |
11 | Starting on Discord can be a bit overwhelming, so here's a quick guide to help you navigate. | |
12 | | |
13 | <!-- Not the case anymore, you'll be prompted to choose your interests. Be sure to select **"AI Agents"** to gain access to the AI Agents Category, which includes all the course-related channels. Feel free to explore and join additional channels if you wish! 🚀--> | |
14 | | |
15 | The HF Community Server hosts a vibrant community with interests in various areas, offering opportunities for learning through paper discussions, events, and more. | |
16 | | |
17 | After [signing up](http://hf.co/join/discord), introduce yourself in the `#introduce-yourself` channel. | |
18 | | |
19 | We created 4 channels for the Agents Course: | |
20 | | |
21 | - `agents-course-announcements`: for the **latest course informations**. | |
22 | - `🎓-agents-course-general`: for **general discussions and chitchat**. | |
23 | - `agents-course-questions`: to **ask questions and help your classmates**. | |
24 | - `agents-course-showcase`: to **show your best agents** . | |
25 | | |
26 | In addition you can check: | |
27 | | |
28 | - `smolagents`: for **discussion and support with the library**. | |
29 | | |
30 | ## Tips for using Discord effectively | |
31 | | |
32 | ### How to join a server | |
33 | | |
34 | If you are less familiar with Discord, you might want to check out this <a href="https://support.discord.com/hc/en-us/articles/360034842871-How-do-I-join-a-Server#h_01FSJF9GT2QJMS2PRAW36WNBS8" target="_blank">guide</a> on how to join a server. | |
35 | | |
36 | Here's a quick summary of the steps: | |
37 | | |
38 | 1. Click on the <a href="https://discord.gg/UrrTSsSyjb" target="_blank">Invite Link</a>. | |
39 | 2. Sign in with your Discord account, or create an account if you don't have one. | |
40 | 3. Validate that you are not an AI agent! | |
41 | 4. Setup your nickname and avatar. | |
42 | 5. Click "Join Server". | |
43 | | |
44 | ### How to use Discord effectively | |
45 | | |
46 | Here are a few tips for using Discord effectively: | |
47 | | |
48 | - **Voice channels** are available, though text chat is more commonly used. | |
49 | - You can format text using **markdown style**, which is especially useful for writing code. Note that markdown doesn't work as well for links. | |
50 | - Consider opening threads for **long conversations** to keep discussions organized. | |
51 | | |
52 | We hope you find this guide helpful! If you have any questions, feel free to ask us on Discord 🤗. | |
53 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit0/onboarding.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Onboarding: Your First Steps ⛵ | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/time-to-onboard.jpg" alt="Time to Onboard" width="100%"/> | |
4 | | |
5 | Now that you have all the details, let's get started! We're going to do four things: | |
6 | | |
7 | 1. **Create your Hugging Face Account** if it's not already done | |
8 | 2. **Sign up to Discord and introduce yourself** (don't be shy 🤗) | |
9 | 3. **Follow the Hugging Face Agents Course** on the Hub | |
10 | 4. **Spread the word** about the course | |
11 | | |
12 | ### Step 1: Create Your Hugging Face Account | |
13 | | |
14 | (If you haven't already) create a Hugging Face account <a href='https://huggingface.co/join' target='_blank'>here</a>. | |
15 | | |
16 | ### Step 2: Join Our Discord Community | |
17 | | |
18 | 👉🏻 Join our discord server <a href="https://discord.gg/UrrTSsSyjb" target="_blank">here.</a> | |
19 | | |
20 | When you join, remember to introduce yourself in `#introduce-yourself`. | |
21 | | |
22 | We have multiple AI Agent-related channels: | |
23 | - `agents-course-announcements`: for the **latest course information**. | |
24 | - `🎓-agents-course-general`: for **general discussions and chitchat**. | |
25 | - `agents-course-questions`: to **ask questions and help your classmates**. | |
26 | - `agents-course-showcase`: to **show your best agents**. | |
27 | | |
28 | In addition you can check: | |
29 | | |
30 | - `smolagents`: for **discussion and support with the library**. | |
31 | | |
32 | If this is your first time using Discord, we wrote a Discord 101 to get the best practices. Check [the next section](discord101). | |
33 | | |
34 | ### Step 3: Follow the Hugging Face Agent Course Organization | |
35 | | |
36 | Stay up to date with the latest course materials, updates, and announcements **by following the Hugging Face Agents Course Organization**. | |
37 | | |
38 | 👉 Go <a href="https://huggingface.co/agents-course" target="_blank">here</a> and click on **follow**. | |
39 | | |
40 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/hf_course_follow.gif" alt="Follow" width="100%"/> | |
41 | | |
42 | ### Step 4: Spread the word about the course | |
43 | | |
44 | Help us make this course more visible! There are two way you can help us: | |
45 | | |
46 | 1. Show your support by ⭐ <a href="https://github.com/huggingface/agents-course" target="_blank">the course's repository</a>. | |
47 | | |
48 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/please_star.gif" alt="Repo star"/> | |
49 | | |
50 | 2. Share Your Learning Journey: Let others **know you're taking this course**! We've prepared an illustration you can use in your social media posts | |
51 | | |
52 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png"> | |
53 | | |
54 | You can download the image by clicking 👉 [here](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png?download=true) | |
55 | | |
56 | ### Step 5: Running Models Locally with Ollama (In case you run into Credit limits) | |
57 | | |
58 | 1. **Install Ollama** | |
59 | | |
60 | Follow the official Instructions <a href="https://ollama.com/download" target="_blank"> here.</a> | |
61 | | |
62 | 2. **Pull a model Locally** | |
63 | ``` bash | |
64 | ollama pull qwen2:7b #Check out ollama website for more models | |
65 | ``` | |
66 | 3. **Start Ollama in the background (In one terminal)** | |
67 | ``` bash | |
68 | ollama serve | |
69 | ``` | |
70 | 4. **Use `LiteLLMModel` Instead of `HfApiModel`** | |
71 | ``` bash | |
72 | from smolagents import LiteLLMModel | |
73 | | |
74 | model = LiteLLMModel( | |
75 | model_id="ollama_chat/qwen2:7b", # Or try other Ollama-supported models | |
76 | api_base="http://127.0.0.1:11434", # Default Ollama local server | |
77 | num_ctx=8192, | |
78 | ) | |
79 | ``` | |
80 | | |
81 | 5. **Why this works?** | |
82 | - Ollama serves models locally using an OpenAI-compatible API at `http://localhost:11434`. | |
83 | - `LiteLLMModel` is built to communicate with any model that supports the OpenAI chat/completion API format. | |
84 | - This means you can simply swap out `HfApiModel` for `LiteLLMModel` no other code changes required. It’s a seamless, plug-and-play solution. | |
85 | | |
86 | Congratulations! 🎉 **You've completed the onboarding process**! You're now ready to start learning about AI Agents. Have fun! | |
87 | | |
88 | Keep Learning, stay awesome 🤗 | |
89 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/README.md: | |
-------------------------------------------------------------------------------- | |
1 | # Table of Contents | |
2 | | |
3 | You can access Unit 1 on hf.co/learn 👉 <a href="https://hf.co/learn/agents-course/unit1/introduction">here</a> | |
4 | | |
5 | <!-- | |
6 | | Title | Description | | |
7 | |-------|-------------| | |
8 | | [Definition of an Agent](1_definition_of_an_agent.md) | General example of what agents can do without technical jargon. | | |
9 | | [Explain LLMs](2_explain_llms.md) | Explanation of Large Language Models, including the family tree of models and suitable models for agents. | | |
10 | | [Messages and Special Tokens](3_messages_and_special_tokens.md) | Explanation of messages, special tokens, and chat-template usage. | | |
11 | | [Dummy Agent Library](4_dummy_agent_library.md) | Introduction to using a dummy agent library and serverless API. | | |
12 | | [Tools](5_tools.md) | Overview of Pydantic for agent tools and other common tool formats. | | |
13 | | [Agent Steps and Structure](6_agent_steps_and_structure.md) | Steps involved in an agent, including thoughts, actions, observations, and a comparison between code agents and JSON agents. | | |
14 | | [Thoughts](7_thoughts.md) | Explanation of thoughts and the ReAct approach. | | |
15 | | [Actions](8_actions.md) | Overview of actions and stop and parse approach. | | |
16 | | [Observations](9_observations.md) | Explanation of observations and append result to reflect. | | |
17 | | [Quizz](10_quizz.md) | Contains quizzes to test understanding of the concepts. | | |
18 | | [Simple Use Case](11_simple_use_case.md) | Provides a simple use case exercise using datetime and a Python function as a tool. | | |
19 | --> | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/actions.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Actions: Enabling the Agent to Engage with Its Environment | |
2 | | |
3 | <Tip> | |
4 | In this section, we explore the concrete steps an AI agent takes to interact with its environment. | |
5 | | |
6 | We’ll cover how actions are represented (using JSON or code), the importance of the stop and parse approach, and introduce different types of agents. | |
7 | </Tip> | |
8 | | |
9 | Actions are the concrete steps an **AI agent takes to interact with its environment**. | |
10 | | |
11 | Whether it’s browsing the web for information or controlling a physical device, each action is a deliberate operation executed by the agent. | |
12 | | |
13 | For example, an agent assisting with customer service might retrieve customer data, offer support articles, or transfer issues to a human representative. | |
14 | | |
15 | ## Types of Agent Actions | |
16 | | |
17 | There are multiple types of Agents that take actions differently: | |
18 | | |
19 | | Type of Agent | Description | | |
20 | |------------------------|--------------------------------------------------------------------------------------------------| | |
21 | | JSON Agent | The Action to take is specified in JSON format. | | |
22 | | Code Agent | The Agent writes a code block that is interpreted externally. | | |
23 | | Function-calling Agent | It is a subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action. | | |
24 | | |
25 | Actions themselves can serve many purposes: | |
26 | | |
27 | | Type of Action | Description | | |
28 | |--------------------------|------------------------------------------------------------------------------------------| | |
29 | | Information Gathering | Performing web searches, querying databases, or retrieving documents. | | |
30 | | Tool Usage | Making API calls, running calculations, and executing code. | | |
31 | | Environment Interaction | Manipulating digital interfaces or controlling physical devices. | | |
32 | | Communication | Engaging with users via chat or collaborating with other agents. | | |
33 | | |
34 | One crucial part of an agent is the **ability to STOP generating new tokens when an action is complete**, and that is true for all formats of Agent: JSON, code, or function-calling. This prevents unintended output and ensures that the agent’s response is clear and precise. | |
35 | | |
36 | The LLM only handles text and uses it to describe the action it wants to take and the parameters to supply to the tool. | |
37 | | |
38 | ## The Stop and Parse Approach | |
39 | | |
40 | One key method for implementing actions is the **stop and parse approach**. This method ensures that the agent’s output is structured and predictable: | |
41 | | |
42 | 1. **Generation in a Structured Format**: | |
43 | | |
44 | The agent outputs its intended action in a clear, predetermined format (JSON or code). | |
45 | | |
46 | 2. **Halting Further Generation**: | |
47 | | |
48 | Once the action is complete, **the agent stops generating additional tokens**. This prevents extra or erroneous output. | |
49 | | |
50 | 3. **Parsing the Output**: | |
51 | | |
52 | An external parser reads the formatted action, determines which Tool to call, and extracts the required parameters. | |
53 | | |
54 | For example, an agent needing to check the weather might output: | |
55 | | |
56 | | |
57 | ```json | |
58 | Thought: I need to check the current weather for New York. | |
59 | Action : | |
60 | { | |
61 | "action": "get_weather", | |
62 | "action_input": {"location": "New York"} | |
63 | } | |
64 | ``` | |
65 | The framework can then easily parse the name of the function to call and the arguments to apply. | |
66 | | |
67 | This clear, machine-readable format minimizes errors and enables external tools to accurately process the agent’s command. | |
68 | | |
69 | Note: Function-calling agents operate similarly by structuring each action so that a designated function is invoked with the correct arguments. | |
70 | We'll dive deeper into those types of Agents in a future Unit. | |
71 | | |
72 | ## Code Agents | |
73 | | |
74 | An alternative approach is using *Code Agents*. | |
75 | The idea is: **instead of outputting a simple JSON object**, a Code Agent generates an **executable code block—typically in a high-level language like Python**. | |
76 | | |
77 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/code-vs-json-actions.png" alt="Code Agents" /> | |
78 | | |
79 | This approach offers several advantages: | |
80 | | |
81 | - **Expressiveness:** Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON. | |
82 | - **Modularity and Reusability:** Generated code can include functions and modules that are reusable across different actions or tasks. | |
83 | - **Enhanced Debuggability:** With a well-defined programming syntax, code errors are often easier to detect and correct. | |
84 | - **Direct Integration:** Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making. | |
85 | | |
86 | For example, a Code Agent tasked with fetching the weather might generate the following Python snippet: | |
87 | | |
88 | ```python | |
89 | # Code Agent Example: Retrieve Weather Information | |
90 | def get_weather(city): | |
91 | import requests | |
92 | api_url = f"https://api.weather.com/v1/location/{city}?apiKey=YOUR_API_KEY" | |
93 | response = requests.get(api_url) | |
94 | if response.status_code == 200: | |
95 | data = response.json() | |
96 | return data.get("weather", "No weather information available") | |
97 | else: | |
98 | return "Error: Unable to fetch weather data." | |
99 | | |
100 | # Execute the function and prepare the final answer | |
101 | result = get_weather("New York") | |
102 | final_answer = f"The current weather in New York is: {result}" | |
103 | print(final_answer) | |
104 | ``` | |
105 | | |
106 | In this example, the Code Agent: | |
107 | | |
108 | - Retrieves weather data **via an API call**, | |
109 | - Processes the response, | |
110 | - And uses the print() function to output a final answer. | |
111 | | |
112 | This method **also follows the stop and parse approach** by clearly delimiting the code block and signaling when execution is complete (here, by printing the final_answer). | |
113 | | |
114 | --- | |
115 | | |
116 | We learned that Actions bridge an agent's internal reasoning and its real-world interactions by executing clear, structured tasks—whether through JSON, code, or function calls. | |
117 | | |
118 | This deliberate execution ensures that each action is precise and ready for external processing via the stop and parse approach. In the next section, we will explore Observations to see how agents capture and integrate feedback from their environment. | |
119 | | |
120 | After this, we will **finally be ready to build our first Agent!** | |
121 | | |
122 | | |
123 | | |
124 | | |
125 | | |
126 | | |
127 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/agent-steps-and-structure.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Understanding AI Agents through the Thought-Action-Observation Cycle | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-check-3.jpg" alt="Unit 1 planning"/> | |
4 | | |
5 | In the previous sections, we learned: | |
6 | | |
7 | - **How tools are made available to the agent in the system prompt**. | |
8 | - **How AI agents are systems that can 'reason', plan, and interact with their environment**. | |
9 | | |
10 | In this section, **we’ll explore the complete AI Agent Workflow**, a cycle we defined as Thought-Action-Observation. | |
11 | | |
12 | And then, we’ll dive deeper on each of these steps. | |
13 | | |
14 | | |
15 | ## The Core Components | |
16 | | |
17 | Agents work in a continuous cycle of: **thinking (Thought) → acting (Act) and observing (Observe)**. | |
18 | | |
19 | Let’s break down these actions together: | |
20 | | |
21 | 1. **Thought**: The LLM part of the Agent decides what the next step should be. | |
22 | 2. **Action:** The agent takes an action, by calling the tools with the associated arguments. | |
23 | 3. **Observation:** The model reflects on the response from the tool. | |
24 | | |
25 | ## The Thought-Action-Observation Cycle | |
26 | | |
27 | The three components work together in a continuous loop. To use an analogy from programming, the agent uses a **while loop**: the loop continues until the objective of the agent has been fulfilled. | |
28 | | |
29 | Visually, it looks like this: | |
30 | | |
31 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AgentCycle.gif" alt="Think, Act, Observe cycle"/> | |
32 | | |
33 | In many Agent frameworks, **the rules and guidelines are embedded directly into the system prompt**, ensuring that every cycle adheres to a defined logic. | |
34 | | |
35 | In a simplified version, our system prompt may look like this: | |
36 | | |
37 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/system_prompt_cycle.png" alt="Think, Act, Observe cycle"/> | |
38 | | |
39 | We see here that in the System Message we defined : | |
40 | | |
41 | - The *Agent's behavior*. | |
42 | - The *Tools our Agent has access to*, as we described in the previous section. | |
43 | - The *Thought-Action-Observation Cycle*, that we bake into the LLM instructions. | |
44 | | |
45 | Let’s take a small example to understand the process before going deeper into each step of the process. | |
46 | | |
47 | ## Alfred, the weather Agent | |
48 | | |
49 | We created Alfred, the Weather Agent. | |
50 | | |
51 | A user asks Alfred: “What’s the current weather in New York?” | |
52 | | |
53 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent.jpg" alt="Alfred Agent"/> | |
54 | | |
55 | Alfred’s job is to answer this query using a weather API tool. | |
56 | | |
57 | Here’s how the cycle unfolds: | |
58 | | |
59 | ### Thought | |
60 | | |
61 | **Internal Reasoning:** | |
62 | | |
63 | Upon receiving the query, Alfred’s internal dialogue might be: | |
64 | | |
65 | *"The user needs current weather information for New York. I have access to a tool that fetches weather data. First, I need to call the weather API to get up-to-date details."* | |
66 | | |
67 | This step shows the agent breaking the problem into steps: first, gathering the necessary data. | |
68 | | |
69 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-1.jpg" alt="Alfred Agent"/> | |
70 | | |
71 | ### Action | |
72 | | |
73 | **Tool Usage:** | |
74 | | |
75 | Based on its reasoning and the fact that Alfred knows about a `get_weather` tool, Alfred prepares a JSON-formatted command that calls the weather API tool. For example, its first action could be: | |
76 | | |
77 | Thought: I need to check the current weather for New York. | |
78 | | |
79 | ``` | |
80 | { | |
81 | "action": "get_weather", | |
82 | "action_input": { | |
83 | "location": "New York" | |
84 | } | |
85 | } | |
86 | ``` | |
87 | | |
88 | Here, the action clearly specifies which tool to call (e.g., get_weather) and what parameter to pass (the “location": “New York”). | |
89 | | |
90 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-2.jpg" alt="Alfred Agent"/> | |
91 | | |
92 | ### Observation | |
93 | | |
94 | **Feedback from the Environment:** | |
95 | | |
96 | After the tool call, Alfred receives an observation. This might be the raw weather data from the API such as: | |
97 | | |
98 | *"Current weather in New York: partly cloudy, 15°C, 60% humidity."* | |
99 | | |
100 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-3.jpg" alt="Alfred Agent"/> | |
101 | | |
102 | This observation is then added to the prompt as additional context. It functions as real-world feedback, confirming whether the action succeeded and providing the needed details. | |
103 | | |
104 | | |
105 | ### Updated thought | |
106 | | |
107 | **Reflecting:** | |
108 | | |
109 | With the observation in hand, Alfred updates its internal reasoning: | |
110 | | |
111 | *"Now that I have the weather data for New York, I can compile an answer for the user."* | |
112 | | |
113 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-4.jpg" alt="Alfred Agent"/> | |
114 | | |
115 | | |
116 | ### Final Action | |
117 | | |
118 | Alfred then generates a final response formatted as we told it to: | |
119 | | |
120 | Thought: I have the weather data now. The current weather in New York is partly cloudy with a temperature of 15°C and 60% humidity." | |
121 | | |
122 | Final answer : The current weather in New York is partly cloudy with a temperature of 15°C and 60% humidity. | |
123 | | |
124 | This final action sends the answer back to the user, closing the loop. | |
125 | | |
126 | | |
127 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-5.jpg" alt="Alfred Agent"/> | |
128 | | |
129 | | |
130 | What we see in this example: | |
131 | | |
132 | - **Agents iterate through a loop until the objective is fulfilled:** | |
133 | | |
134 | **Alfred’s process is cyclical**. It starts with a thought, then acts by calling a tool, and finally observes the outcome. If the observation had indicated an error or incomplete data, Alfred could have re-entered the cycle to correct its approach. | |
135 | | |
136 | - **Tool Integration:** | |
137 | | |
138 | The ability to call a tool (like a weather API) enables Alfred to go **beyond static knowledge and retrieve real-time data**, an essential aspect of many AI Agents. | |
139 | | |
140 | - **Dynamic Adaptation:** | |
141 | | |
142 | Each cycle allows the agent to incorporate fresh information (observations) into its reasoning (thought), ensuring that the final answer is well-informed and accurate. | |
143 | | |
144 | This example showcases the core concept behind the *ReAct cycle* (a concept we're going to develop in the next section): **the interplay of Thought, Action, and Observation empowers AI agents to solve complex tasks iteratively**. | |
145 | | |
146 | By understanding and applying these principles, you can design agents that not only reason about their tasks but also **effectively utilize external tools to complete them**, all while continuously refining their output based on environmental feedback. | |
147 | | |
148 | --- | |
149 | | |
150 | Let’s now dive deeper into the Thought, Action, Observation as the individual steps of the process. | |
151 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/conclusion.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Conclusion [[conclusion]] | |
2 | | |
3 | Congratulations on finishing this first Unit 🥳 | |
4 | | |
5 | You've just **mastered the fundamentals of Agents** and you've created your first AI Agent! | |
6 | | |
7 | It's **normal if you still feel confused by some of these elements**. Agents are a complex topic and it's common to take a while to grasp everything. | |
8 | | |
9 | **Take time to really grasp the material** before continuing. It’s important to master these elements and have a solid foundation before entering the fun part. | |
10 | | |
11 | And if you pass the Quiz test, don't forget to get your certificate 🎓 👉 [here](https://huggingface.co/spaces/agents-course/unit1-certification-app) | |
12 | | |
13 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/certificate-example.jpg" alt="Certificate Example"/> | |
14 | | |
15 | In the next (bonus) unit, you're going to learn **to fine-tune a Agent to do function calling (aka to be able to call tools based on user prompt)**. | |
16 | | |
17 | Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog) | |
18 | | |
19 | ### Keep Learning, stay awesome 🤗 | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/final-quiz.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Unit 1 Quiz | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-unit1sub4DONE.jpg" alt="Unit 1 planning"/> | |
4 | | |
5 | Well done on working through the first unit! Let's test your understanding of the key concepts covered so far. | |
6 | | |
7 | When you pass the quiz, proceed to the next section to claim your certificate. | |
8 | | |
9 | Good luck! | |
10 | | |
11 | ## Quiz | |
12 | | |
13 | Here is the interactive quiz. The quiz is hosted on the Hugging Face Hub in a space. It will take you through a set of multiple choice questions to test your understanding of the key concepts covered in this unit. Once you've completed the quiz, you'll be able to see your score and a breakdown of the correct answers. | |
14 | | |
15 | One important thing: **don't forget to click on Submit after you passed, otherwise your exam score will not be saved!** | |
16 | | |
17 | <iframe | |
18 | src="https://agents-course-unit-1-quiz.hf.space" | |
19 | frameborder="0" | |
20 | width="850" | |
21 | height="450" | |
22 | ></iframe> | |
23 | | |
24 | You can also access the quiz 👉 [here](https://huggingface.co/spaces/agents-course/unit_1_quiz) | |
25 | | |
26 | ## Certificate | |
27 | | |
28 | Now that you have successfully passed the quiz, **you can get your certificate 🎓** | |
29 | | |
30 | When you complete the quiz, it will grant you access to a certificate of completion for this unit. You can download and share this certificate to showcase your progress in the course. | |
31 | | |
32 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-unit1sub5DONE.jpg" alt="Unit 1 planning"/> | |
33 | | |
34 | Once you receive your certificate, you can add it to your LinkedIn 🧑💼 or share it on X, Bluesky, etc. **We would be super proud and would love to congratulate you if you tag @huggingface**! 🤗 | |
35 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/introduction.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Introduction to Agents | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/thumbnail.jpg" alt="Thumbnail"/> | |
4 | | |
5 | Welcome to this first unit, where **you'll build a solid foundation in the fundamentals of AI Agents** including: | |
6 | | |
7 | - **Understanding Agents** | |
8 | - What is an Agent, and how does it work? | |
9 | - How do Agents make decisions using reasoning and planning? | |
10 | | |
11 | - **The Role of LLMs (Large Language Models) in Agents** | |
12 | - How LLMs serve as the “brain” behind an Agent. | |
13 | - How LLMs structure conversations via the Messages system. | |
14 | | |
15 | - **Tools and Actions** | |
16 | - How Agents use external tools to interact with the environment. | |
17 | - How to build and integrate tools for your Agent. | |
18 | | |
19 | - **The Agent Workflow:** | |
20 | - *Think* → *Act* → *Observe*. | |
21 | | |
22 | After exploring these topics, **you’ll build your first Agent** using `smolagents`! | |
23 | | |
24 | Your Agent, named Alfred, will handle a simple task and demonstrate how to apply these concepts in practice. | |
25 | | |
26 | You’ll even learn how to **publish your Agent on Hugging Face Spaces**, so you can share it with friends and colleagues. | |
27 | | |
28 | Finally, at the end of this Unit, you'll take a quiz. Pass it, and you'll **earn your first course certification**: the 🎓 Certificate of Fundamentals of Agents. | |
29 | | |
30 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/certificate-example.jpg" alt="Certificate Example"/> | |
31 | | |
32 | This Unit is your **essential starting point**, laying the groundwork for understanding Agents before you move on to more advanced topics. | |
33 | | |
34 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-no-check.jpg" alt="Unit 1 planning"/> | |
35 | | |
36 | It's a big unit, so **take your time** and don’t hesitate to come back to these sections from time to time. | |
37 | | |
38 | Ready? Let’s dive in! 🚀 | |
39 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/observations.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Observe: Integrating Feedback to Reflect and Adapt | |
2 | | |
3 | Observations are **how an Agent perceives the consequences of its actions**. | |
4 | | |
5 | They provide crucial information that fuels the Agent's thought process and guides future actions. | |
6 | | |
7 | They are **signals from the environment**—whether it’s data from an API, error messages, or system logs—that guide the next cycle of thought. | |
8 | | |
9 | In the observation phase, the agent: | |
10 | | |
11 | - **Collects Feedback:** Receives data or confirmation that its action was successful (or not). | |
12 | - **Appends Results:** Integrates the new information into its existing context, effectively updating its memory. | |
13 | - **Adapts its Strategy:** Uses this updated context to refine subsequent thoughts and actions. | |
14 | | |
15 | For example, if a weather API returns the data *"partly cloudy, 15°C, 60% humidity"*, this observation is appended to the agent’s memory (at the end of the prompt). | |
16 | | |
17 | The Agent then uses it to decide whether additional information is needed or if it’s ready to provide a final answer. | |
18 | | |
19 | This **iterative incorporation of feedback ensures the agent remains dynamically aligned with its goals**, constantly learning and adjusting based on real-world outcomes. | |
20 | | |
21 | These observations **can take many forms**, from reading webpage text to monitoring a robot arm's position. This can be seen like Tool "logs" that provide textual feedback of the Action execution. | |
22 | | |
23 | | Type of Observation | Example | | |
24 | |---------------------|---------------------------------------------------------------------------| | |
25 | | System Feedback | Error messages, success notifications, status codes | | |
26 | | Data Changes | Database updates, file system modifications, state changes | | |
27 | | Environmental Data | Sensor readings, system metrics, resource usage | | |
28 | | Response Analysis | API responses, query results, computation outputs | | |
29 | | Time-based Events | Deadlines reached, scheduled tasks completed | | |
30 | | |
31 | ## How Are the Results Appended? | |
32 | | |
33 | After performing an action, the framework follows these steps in order: | |
34 | | |
35 | 1. **Parse the action** to identify the function(s) to call and the argument(s) to use. | |
36 | 2. **Execute the action.** | |
37 | 3. **Append the result** as an **Observation**. | |
38 | | |
39 | --- | |
40 | We've now learned the Agent's Thought-Action-Observation Cycle. | |
41 | | |
42 | If some aspects still seem a bit blurry, don't worry—we'll revisit and deepen these concepts in future Units. | |
43 | | |
44 | Now, it's time to put your knowledge into practice by coding your very first Agent! | |
45 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/quiz1.mdx: | |
-------------------------------------------------------------------------------- | |
1 | ### Q1: What is an Agent? | |
2 | Which of the following best describes an AI Agent? | |
3 | | |
4 | <Question | |
5 | choices={[ | |
6 | { | |
7 | text: "An AI model that can reason, plan, and use tools to interact with its environment to achieve a specific goal.", | |
8 | explain: "This definition captures the essential characteristics of an Agent.", | |
9 | correct: true | |
10 | }, | |
11 | { | |
12 | text: "A system that solely processes static text, without any inherent mechanism to interact dynamically with its surroundings or execute meaningful actions.", | |
13 | explain: "An Agent must be able to take an action and interact with its environment.", | |
14 | }, | |
15 | { | |
16 | text: "A conversational agent restricted to answering queries, lacking the ability to perform any actions or interact with external systems.", | |
17 | explain: "A chatbot like this lacks the ability to take actions, making it different from an Agent.", | |
18 | }, | |
19 | { | |
20 | text: "An online repository of information that offers static content without the capability to execute tasks or interact actively with users.", | |
21 | explain: "An Agent actively interacts with its environment rather than just providing static information.", | |
22 | } | |
23 | ]} | |
24 | /> | |
25 | | |
26 | --- | |
27 | | |
28 | ### Q2: What is the Role of Planning in an Agent? | |
29 | Why does an Agent need to plan before taking an action? | |
30 | | |
31 | <Question | |
32 | choices={[ | |
33 | { | |
34 | text: "To primarily store or recall past interactions, rather than mapping out a sequence of future actions.", | |
35 | explain: "Planning is about determining future actions, not storing past interactions.", | |
36 | }, | |
37 | { | |
38 | text: "To decide on the sequence of actions and select appropriate tools needed to fulfill the user’s request.", | |
39 | explain: "Planning helps the Agent determine the best steps and tools to complete a task.", | |
40 | correct: true | |
41 | }, | |
42 | { | |
43 | text: "To execute a sequence of arbitrary and uncoordinated actions that lack any defined strategy or intentional objective.", | |
44 | explain: "Planning ensures the Agent's actions are intentional and not random.", | |
45 | }, | |
46 | { | |
47 | text: "To merely convert or translate text, bypassing any process of formulating a deliberate sequence of actions or employing strategic reasoning.", | |
48 | explain: "Planning is about structuring actions, not just converting text.", | |
49 | } | |
50 | ]} | |
51 | /> | |
52 | | |
53 | --- | |
54 | | |
55 | ### Q3: How Do Tools Enhance an Agent's Capabilities? | |
56 | Why are tools essential for an Agent? | |
57 | | |
58 | <Question | |
59 | choices={[ | |
60 | { | |
61 | text: "Tools serve no real purpose and do not contribute to the Agent’s ability to perform actions beyond basic text generation.", | |
62 | explain: "Tools expand an Agent's capabilities by allowing it to perform actions beyond text generation.", | |
63 | }, | |
64 | { | |
65 | text: "Tools are solely designed for memory storage, lacking any capacity to facilitate the execution of tasks or enhance interactive performance.", | |
66 | explain: "Tools are primarily for performing actions, not just for storing data.", | |
67 | }, | |
68 | { | |
69 | text: "Tools severely restrict the Agent exclusively to generating text, thereby preventing it from engaging in a broader range of interactive actions.", | |
70 | explain: "On the contrary, tools allow Agents to go beyond text-based responses.", | |
71 | }, | |
72 | { | |
73 | text: "Tools provide the Agent with the ability to execute actions a text-generation model cannot perform natively, such as making coffee or generating images.", | |
74 | explain: "Tools enable Agents to interact with the real world and complete tasks.", | |
75 | correct: true | |
76 | } | |
77 | ]} | |
78 | /> | |
79 | | |
80 | --- | |
81 | | |
82 | ### Q4: How Do Actions Differ from Tools? | |
83 | What is the key difference between Actions and Tools? | |
84 | | |
85 | <Question | |
86 | choices={[ | |
87 | { | |
88 | text: "Actions are the steps the Agent takes, while Tools are external resources the Agent can use to perform those actions.", | |
89 | explain: "Actions are higher-level objectives, while Tools are specific functions the Agent can call upon.", | |
90 | correct: true | |
91 | }, | |
92 | { | |
93 | text: "Actions and Tools are entirely identical components that can be used interchangeably, with no clear differences between them.", | |
94 | explain: "No, Actions are goals or tasks, while Tools are specific utilities the Agent uses to achieve them.", | |
95 | }, | |
96 | { | |
97 | text: "Tools are considered broad utilities available for various functions, whereas Actions are mistakenly thought to be restricted only to physical interactions.", | |
98 | explain: "Not necessarily. Actions can involve both digital and physical tasks.", | |
99 | }, | |
100 | { | |
101 | text: "Actions inherently require the use of LLMs to be determined and executed, whereas Tools are designed to function autonomously without such dependencies.", | |
102 | explain: "While LLMs help decide Actions, Actions themselves are not dependent on LLMs.", | |
103 | } | |
104 | ]} | |
105 | /> | |
106 | | |
107 | --- | |
108 | | |
109 | ### Q5: What Role Do Large Language Models (LLMs) Play in Agents? | |
110 | How do LLMs contribute to an Agent’s functionality? | |
111 | | |
112 | <Question | |
113 | choices={[ | |
114 | { | |
115 | text: "LLMs function merely as passive repositories that store information, lacking any capability to actively process input or produce dynamic responses.", | |
116 | explain: "LLMs actively process text input and generate responses, rather than just storing information.", | |
117 | }, | |
118 | { | |
119 | text: "LLMs serve as the reasoning 'brain' of the Agent, processing text inputs to understand instructions and plan actions.", | |
120 | explain: "LLMs enable the Agent to interpret, plan, and decide on the next steps.", | |
121 | correct: true | |
122 | }, | |
123 | { | |
124 | text: "LLMs are erroneously believed to be used solely for image processing, when in fact their primary function is to process and generate text.", | |
125 | explain: "LLMs primarily work with text, although they can sometimes interact with multimodal inputs.", | |
126 | }, | |
127 | { | |
128 | text: "LLMs are considered completely irrelevant to the operation of AI Agents, implying that they are entirely superfluous in any practical application.", | |
129 | explain: "LLMs are a core component of modern AI Agents.", | |
130 | } | |
131 | ]} | |
132 | /> | |
133 | | |
134 | --- | |
135 | | |
136 | ### Q6: Which of the Following Best Demonstrates an AI Agent? | |
137 | Which real-world example best illustrates an AI Agent at work? | |
138 | | |
139 | <Question | |
140 | choices={[ | |
141 | { | |
142 | text: "A static FAQ page on a website that provides fixed information and lacks any interactive or dynamic response capabilities.", | |
143 | explain: "A static FAQ page does not interact dynamically with users or take actions.", | |
144 | }, | |
145 | { | |
146 | text: "A simple calculator that performs arithmetic operations based on fixed rules, without any capability for reasoning or planning.", | |
147 | explain: "A calculator follows fixed rules without reasoning or planning, so it is not an Agent.", | |
148 | }, | |
149 | { | |
150 | text: "A virtual assistant like Siri or Alexa that can understand spoken commands, reason through them, and perform tasks like setting reminders or sending messages.", | |
151 | explain: "This example includes reasoning, planning, and interaction with the environment.", | |
152 | correct: true | |
153 | }, | |
154 | { | |
155 | text: "A video game NPC that operates on a fixed script of responses, without the ability to reason, plan, or use external tools.", | |
156 | explain: "Unless the NPC can reason, plan, and use tools, it does not function as an AI Agent.", | |
157 | } | |
158 | ]} | |
159 | /> | |
160 | | |
161 | --- | |
162 | | |
163 | Congrats on finishing this Quiz 🥳! If you need to review any elements, take the time to revisit the chapter to reinforce your knowledge before diving deeper into the "Agent's brain": LLMs. | |
164 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/quiz2.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Quick Self-Check (ungraded) [[quiz2]] | |
2 | | |
3 | | |
4 | What?! Another Quiz? We know, we know, ... 😅 But this short, ungraded quiz is here to **help you reinforce key concepts you've just learned**. | |
5 | | |
6 | This quiz covers Large Language Models (LLMs), message systems, and tools; essential components for understanding and building AI agents. | |
7 | | |
8 | ### Q1: Which of the following best describes an AI tool? | |
9 | | |
10 | <Question | |
11 | choices={[ | |
12 | { | |
13 | text: "A process that only generates text responses", | |
14 | explain: "", | |
15 | }, | |
16 | { | |
17 | text: "An executable process or external API that allows agents to perform specific tasks and interact with external environments", | |
18 | explain: "Tools are executable functions that agents can use to perform specific tasks and interact with external environments.", | |
19 | correct: true | |
20 | }, | |
21 | { | |
22 | text: "A feature that stores agent conversations", | |
23 | explain: "", | |
24 | } | |
25 | ]} | |
26 | /> | |
27 | | |
28 | --- | |
29 | | |
30 | ### Q2: How do AI agents use tools as a form of "acting" in an environment? | |
31 | | |
32 | <Question | |
33 | choices={[ | |
34 | { | |
35 | text: "By passively waiting for user instructions", | |
36 | explain: "", | |
37 | }, | |
38 | { | |
39 | text: "By only using pre-programmed responses", | |
40 | explain: "", | |
41 | }, | |
42 | { | |
43 | text: "By asking the LLM to generate tool invocation code when appropriate and running tools on behalf of the model", | |
44 | explain: "Agents can invoke tools and use reasoning to plan and re-plan based on the information gained.", | |
45 | correct: true | |
46 | } | |
47 | ]} | |
48 | /> | |
49 | | |
50 | --- | |
51 | | |
52 | ### Q3: What is a Large Language Model (LLM)? | |
53 | | |
54 | <Question | |
55 | choices={[ | |
56 | { | |
57 | text: "A simple chatbot designed to respond with pre-defined answers", | |
58 | explain: "", | |
59 | }, | |
60 | { | |
61 | text: "A deep learning model trained on large amounts of text to understand and generate human-like language", | |
62 | explain: "", | |
63 | correct: true | |
64 | }, | |
65 | { | |
66 | text: "A rule-based AI that follows strict predefined commands", | |
67 | explain: "", | |
68 | } | |
69 | ]} | |
70 | /> | |
71 | | |
72 | --- | |
73 | | |
74 | ### Q4: Which of the following best describes the role of special tokens in LLMs? | |
75 | | |
76 | <Question | |
77 | choices={[ | |
78 | { | |
79 | text: "They are additional words stored in the model's vocabulary to enhance text generation quality", | |
80 | explain: "", | |
81 | }, | |
82 | { | |
83 | text: "They serve specific functions like marking the end of a sequence (EOS) or separating different message roles in chat models", | |
84 | explain: "", | |
85 | correct: true | |
86 | }, | |
87 | { | |
88 | text: "They are randomly inserted tokens used to improve response variability", | |
89 | explain: "", | |
90 | } | |
91 | ]} | |
92 | /> | |
93 | | |
94 | --- | |
95 | | |
96 | ### Q5: How do AI chat models process user messages internally? | |
97 | | |
98 | <Question | |
99 | choices={[ | |
100 | { | |
101 | text: "They directly interpret messages as structured commands with no transformations", | |
102 | explain: "", | |
103 | }, | |
104 | { | |
105 | text: "They convert user messages into a formatted prompt by concatenating system, user, and assistant messages", | |
106 | explain: "", | |
107 | correct: true | |
108 | }, | |
109 | { | |
110 | text: "They generate responses randomly based on previous conversations", | |
111 | explain: "", | |
112 | } | |
113 | ]} | |
114 | /> | |
115 | | |
116 | --- | |
117 | | |
118 | | |
119 | Got it? Great! Now let's **dive into the complete Agent flow and start building your first AI Agent!** | |
120 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/thoughts.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Thought: Internal Reasoning and the ReAct Approach | |
2 | | |
3 | <Tip> | |
4 | In this section, we dive into the inner workings of an AI agent—its ability to reason and plan. We’ll explore how the agent leverages its internal dialogue to analyze information, break down complex problems into manageable steps, and decide what action to take next. Additionally, we introduce the ReAct approach, a prompting technique that encourages the model to think “step by step” before acting. | |
5 | </Tip> | |
6 | | |
7 | Thoughts represent the **Agent's internal reasoning and planning processes** to solve the task. | |
8 | | |
9 | This utilises the agent's Large Language Model (LLM) capacity **to analyze information when presented in its prompt**. | |
10 | | |
11 | Think of it as the agent's internal dialogue, where it considers the task at hand and strategizes its approach. | |
12 | | |
13 | The Agent's thoughts are responsible for accessing current observations and decide what the next action(s) should be. | |
14 | | |
15 | Through this process, the agent can **break down complex problems into smaller, more manageable steps**, reflect on past experiences, and continuously adjust its plans based on new information. | |
16 | | |
17 | Here are some examples of common thoughts: | |
18 | | |
19 | | Type of Thought | Example | | |
20 | |----------------|---------| | |
21 | | Planning | "I need to break this task into three steps: 1) gather data, 2) analyze trends, 3) generate report" | | |
22 | | Analysis | "Based on the error message, the issue appears to be with the database connection parameters" | | |
23 | | Decision Making | "Given the user's budget constraints, I should recommend the mid-tier option" | | |
24 | | Problem Solving | "To optimize this code, I should first profile it to identify bottlenecks" | | |
25 | | Memory Integration | "The user mentioned their preference for Python earlier, so I'll provide examples in Python" | | |
26 | | Self-Reflection | "My last approach didn't work well, I should try a different strategy" | | |
27 | | Goal Setting | "To complete this task, I need to first establish the acceptance criteria" | | |
28 | | Prioritization | "The security vulnerability should be addressed before adding new features" | | |
29 | | |
30 | > **Note:** In the case of LLMs fine-tuned for function-calling, the thought process is optional. | |
31 | > *In case you're not familiar with function-calling, there will be more details in the Actions section.* | |
32 | | |
33 | ## The ReAct Approach | |
34 | | |
35 | A key method is the **ReAct approach**, which is the concatenation of "Reasoning" (Think) with "Acting" (Act). | |
36 | | |
37 | ReAct is a simple prompting technique that appends "Let's think step by step" before letting the LLM decode the next tokens. | |
38 | | |
39 | Indeed, prompting the model to think "step by step" encourages the decoding process toward next tokens **that generate a plan**, rather than a final solution, since the model is encouraged to **decompose** the problem into *sub-tasks*. | |
40 | | |
41 | This allows the model to consider sub-steps in more detail, which in general leads to less errors than trying to generate the final solution directly. | |
42 | | |
43 | <figure> | |
44 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/ReAct.png" alt="ReAct"/> | |
45 | <figcaption>The (d) is an example of ReAct approach where we prompt "Let's think step by step" | |
46 | </figcaption> | |
47 | </figure> | |
48 | | |
49 | <Tip> | |
50 | We have recently seen a lot of interest for reasoning strategies. This is what's behind models like Deepseek R1 or OpenAI's o1, which have been fine-tuned to "think before answering". | |
51 | | |
52 | These models have been trained to always include specific _thinking_ sections (enclosed between `<think>` and `</think>` special tokens). This is not just a prompting technique like ReAct, but a training method where the model learns to generate these sections after analyzing thousands of examples that show what we expect it to do. | |
53 | </Tip> | |
54 | | |
55 | --- | |
56 | Now that we better understand the Thought process, let's go deeper on the second part of the process: Act. | |
57 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/tutorial.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Let's Create Our First Agent Using smolagents | |
2 | | |
3 | In the last section, we learned how we can create Agents from scratch using Python code, and we **saw just how tedious that process can be**. Fortunately, many Agent libraries simplify this work by **handling much of the heavy lifting for you**. | |
4 | | |
5 | In this tutorial, **you'll create your very first Agent** capable of performing actions such as image generation, web search, time zone checking and much more! | |
6 | | |
7 | You will also publish your agent **on a Hugging Face Space so you can share it with friends and colleagues**. | |
8 | | |
9 | Let's get started! | |
10 | | |
11 | | |
12 | ## What is smolagents? | |
13 | | |
14 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/smolagents.png" alt="smolagents"/> | |
15 | | |
16 | To make this Agent, we're going to use `smolagents`, a library that **provides a framework for developing your agents with ease**. | |
17 | | |
18 | This lightweight library is designed for simplicity, but it abstracts away much of the complexity of building an Agent, allowing you to focus on designing your agent's behavior. | |
19 | | |
20 | We're going to get deeper into smolagents in the next Unit. Meanwhile, you can also check this <a href="https://huggingface.co/blog/smolagents" target="_blank">blog post</a> or the library's <a href="https://github.com/huggingface/smolagents" target="_blank">repo in GitHub</a>. | |
21 | | |
22 | In short, `smolagents` is a library that focuses on **codeAgent**, a kind of agent that performs **"Actions"** through code blocks, and then **"Observes"** results by executing the code. | |
23 | | |
24 | Here is an example of what we'll build! | |
25 | | |
26 | We provided our agent with an **Image generation tool** and asked it to generate an image of a cat. | |
27 | | |
28 | The agent inside `smolagents` is going to have the **same behaviors as the custom one we built previously**: it's going **to think, act and observe in cycle** until it reaches a final answer: | |
29 | | |
30 | <iframe width="560" height="315" src="https://www.youtube.com/embed/PQDKcWiuln4?si=ysSTDZoi8y55FVvA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> | |
31 | | |
32 | Exciting, right? | |
33 | | |
34 | ## Let's build our Agent! | |
35 | | |
36 | To start, duplicate this Space: <a href="https://huggingface.co/spaces/agents-course/First_agent_template" target="_blank">https://huggingface.co/spaces/agents-course/First_agent_template</a> | |
37 | > Thanks to <a href="https://huggingface.co/m-ric" target="_blank">Aymeric</a> for this template! 🙌 | |
38 | | |
39 | | |
40 | Duplicating this space means **creating a local copy on your own profile**: | |
41 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/duplicate-space.gif" alt="Duplicate"/> | |
42 | | |
43 | After duplicating the Space, you'll need to add your Hugging Face API token so your agent can access the model API: | |
44 | | |
45 | 1. First, get your Hugging Face token from [https://hf.co/settings/tokens](https://hf.co/settings/tokens) with permission for inference, if you don't already have one | |
46 | 2. Go to your duplicated Space and click on the **Settings** tab | |
47 | 3. Scroll down to the **Variables and Secrets** section and click **New Secret** | |
48 | 4. Create a secret with the name `HF_TOKEN` and paste your token as the value | |
49 | 5. Click **Save** to store your token securely | |
50 | | |
51 | Throughout this lesson, the only file you will need to modify is the (currently incomplete) **"app.py"**. You can see here the [original one in the template](https://huggingface.co/spaces/agents-course/First_agent_template/blob/main/app.py). To find yours, go to your copy of the space, then click the `Files` tab and then on `app.py` in the directory listing. | |
52 | | |
53 | Let's break down the code together: | |
54 | | |
55 | - The file begins with some simple but necessary library imports | |
56 | | |
57 | ```python | |
58 | from smolagents import CodeAgent, DuckDuckGoSearchTool, FinalAnswerTool, HfApiModel, load_tool, tool | |
59 | import datetime | |
60 | import requests | |
61 | import pytz | |
62 | import yaml | |
63 | ``` | |
64 | | |
65 | As outlined earlier, we will directly use the **CodeAgent** class from **smolagents**. | |
66 | | |
67 | | |
68 | ### The Tools | |
69 | | |
70 | Now let's get into the tools! If you want a refresher about tools, don't hesitate to go back to the [Tools](tools) section of the course. | |
71 | | |
72 | ```python | |
73 | @tool | |
74 | def my_custom_tool(arg1:str, arg2:int)-> str: # it's important to specify the return type | |
75 | # Keep this format for the tool description / args description but feel free to modify the tool | |
76 | """A tool that does nothing yet | |
77 | Args: | |
78 | arg1: the first argument | |
79 | arg2: the second argument | |
80 | """ | |
81 | return "What magic will you build ?" | |
82 | | |
83 | @tool | |
84 | def get_current_time_in_timezone(timezone: str) -> str: | |
85 | """A tool that fetches the current local time in a specified timezone. | |
86 | Args: | |
87 | timezone: A string representing a valid timezone (e.g., 'America/New_York'). | |
88 | """ | |
89 | try: | |
90 | # Create timezone object | |
91 | tz = pytz.timezone(timezone) | |
92 | # Get current time in that timezone | |
93 | local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S") | |
94 | return f"The current local time in {timezone} is: {local_time}" | |
95 | except Exception as e: | |
96 | return f"Error fetching time for timezone '{timezone}': {str(e)}" | |
97 | ``` | |
98 | | |
99 | | |
100 | The Tools are what we are encouraging you to build in this section! We give you two examples: | |
101 | | |
102 | 1. A **non-working dummy Tool** that you can modify to make something useful. | |
103 | 2. An **actually working Tool** that gets the current time somewhere in the world. | |
104 | | |
105 | To define your tool it is important to: | |
106 | | |
107 | 1. Provide input and output types for your function, like in `get_current_time_in_timezone(timezone: str) -> str:` | |
108 | 2. **A well formatted docstring**. `smolagents` is expecting all the arguments to have a **textual description in the docstring**. | |
109 | | |
110 | ### The Agent | |
111 | | |
112 | It uses [`Qwen/Qwen2.5-Coder-32B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) as the LLM engine. This is a very capable model that we'll access via the serverless API. | |
113 | | |
114 | ```python | |
115 | final_answer = FinalAnswerTool() | |
116 | model = HfApiModel( | |
117 | max_tokens=2096, | |
118 | temperature=0.5, | |
119 | model_id='Qwen/Qwen2.5-Coder-32B-Instruct', | |
120 | custom_role_conversions=None, | |
121 | ) | |
122 | | |
123 | with open("prompts.yaml", 'r') as stream: | |
124 | prompt_templates = yaml.safe_load(stream) | |
125 | | |
126 | # We're creating our CodeAgent | |
127 | agent = CodeAgent( | |
128 | model=model, | |
129 | tools=[final_answer], # add your tools here (don't remove final_answer) | |
130 | max_steps=6, | |
131 | verbosity_level=1, | |
132 | grammar=None, | |
133 | planning_interval=None, | |
134 | name=None, | |
135 | description=None, | |
136 | prompt_templates=prompt_templates | |
137 | ) | |
138 | | |
139 | GradioUI(agent).launch() | |
140 | ``` | |
141 | | |
142 | This Agent still uses the `InferenceClient` we saw in an earlier section behind the **HfApiModel** class! | |
143 | | |
144 | We will give more in-depth examples when we present the framework in Unit 2. For now, you need to focus on **adding new tools to the list of tools** using the `tools` parameter of your Agent. | |
145 | | |
146 | For example, you could use the `DuckDuckGoSearchTool` that was imported in the first line of the code, or you can examine the `image_generation_tool` that is loaded from the Hub later in the code. | |
147 | | |
148 | **Adding tools will give your agent new capabilities**, try to be creative here! | |
149 | | |
150 | ### The System Prompt | |
151 | | |
152 | The agent's system prompt is stored in a seperate `prompts.yaml` file. This file contains predefined instructions that guide the agent's behavior. | |
153 | | |
154 | Storing prompts in a YAML file allows for easy customization and reuse across different agents or use cases. | |
155 | | |
156 | You can check the [Space's file structure](https://huggingface.co/spaces/agents-course/First_agent_template/tree/main) to see where the `prompts.yaml` file is located and how it's organized within the project. | |
157 | | |
158 | The complete "app.py": | |
159 | | |
160 | ```python | |
161 | from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel, load_tool, tool | |
162 | import datetime | |
163 | import requests | |
164 | import pytz | |
165 | import yaml | |
166 | from tools.final_answer import FinalAnswerTool | |
167 | | |
168 | from Gradio_UI import GradioUI | |
169 | | |
170 | # Below is an example of a tool that does nothing. Amaze us with your creativity! | |
171 | @tool | |
172 | def my_custom_tool(arg1:str, arg2:int)-> str: # it's important to specify the return type | |
173 | # Keep this format for the tool description / args description but feel free to modify the tool | |
174 | """A tool that does nothing yet | |
175 | Args: | |
176 | arg1: the first argument | |
177 | arg2: the second argument | |
178 | """ | |
179 | return "What magic will you build ?" | |
180 | | |
181 | @tool | |
182 | def get_current_time_in_timezone(timezone: str) -> str: | |
183 | """A tool that fetches the current local time in a specified timezone. | |
184 | Args: | |
185 | timezone: A string representing a valid timezone (e.g., 'America/New_York'). | |
186 | """ | |
187 | try: | |
188 | # Create timezone object | |
189 | tz = pytz.timezone(timezone) | |
190 | # Get current time in that timezone | |
191 | local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S") | |
192 | return f"The current local time in {timezone} is: {local_time}" | |
193 | except Exception as e: | |
194 | return f"Error fetching time for timezone '{timezone}': {str(e)}" | |
195 | | |
196 | | |
197 | final_answer = FinalAnswerTool() | |
198 | model = HfApiModel( | |
199 | max_tokens=2096, | |
200 | temperature=0.5, | |
201 | model_id='Qwen/Qwen2.5-Coder-32B-Instruct', | |
202 | custom_role_conversions=None, | |
203 | ) | |
204 | | |
205 | | |
206 | # Import tool from Hub | |
207 | image_generation_tool = load_tool("agents-course/text-to-image", trust_remote_code=True) | |
208 | | |
209 | # Load system prompt from prompt.yaml file | |
210 | with open("prompts.yaml", 'r') as stream: | |
211 | prompt_templates = yaml.safe_load(stream) | |
212 | | |
213 | agent = CodeAgent( | |
214 | model=model, | |
215 | tools=[final_answer], # add your tools here (don't remove final_answer) | |
216 | max_steps=6, | |
217 | verbosity_level=1, | |
218 | grammar=None, | |
219 | planning_interval=None, | |
220 | name=None, | |
221 | description=None, | |
222 | prompt_templates=prompt_templates # Pass system prompt to CodeAgent | |
223 | ) | |
224 | | |
225 | | |
226 | GradioUI(agent).launch() | |
227 | ``` | |
228 | | |
229 | Your **Goal** is to get familiar with the Space and the Agent. | |
230 | | |
231 | Currently, the agent in the template **does not use any tools, so try to provide it with some of the pre-made ones or even make some new tools yourself!** | |
232 | | |
233 | We are eagerly waiting for your amazing agents output in the discord channel **#agents-course-showcase**! | |
234 | | |
235 | | |
236 | --- | |
237 | Congratulations, you've built your first Agent! Don't hesitate to share it with your friends and colleagues. | |
238 | | |
239 | Since this is your first try, it's perfectly normal if it's a little buggy or slow. In future units, we'll learn how to build even better Agents. | |
240 | | |
241 | The best way to learn is to try, so don't hesitate to update it, add more tools, try with another model, etc. | |
242 | | |
243 | In the next section, you're going to fill the final Quiz and get your certificate! | |
244 | | |
245 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit1/what-are-agents.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # What is an Agent? | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-no-check.jpg" alt="Unit 1 planning"/> | |
4 | | |
5 | By the end of this section, you'll feel comfortable with the concept of agents and their various applications in AI. | |
6 | | |
7 | To explain what an Agent is, let's start with an analogy. | |
8 | | |
9 | ## The Big Picture: Alfred The Agent | |
10 | | |
11 | Meet Alfred. Alfred is an **Agent**. | |
12 | | |
13 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/this-is-alfred.jpg" alt="This is Alfred"/> | |
14 | | |
15 | Imagine Alfred **receives a command**, such as: "Alfred, I would like a coffee please." | |
16 | | |
17 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/coffee-please.jpg" alt="I would like a coffee"/> | |
18 | | |
19 | Because Alfred **understands natural language**, he quickly grasps our request. | |
20 | | |
21 | Before fulfilling the order, Alfred engages in **reasoning and planning**, figuring out the steps and tools he needs to: | |
22 | | |
23 | 1. Go to the kitchen | |
24 | 2. Use the coffee machine | |
25 | 3. Brew the coffee | |
26 | 4. Bring the coffee back | |
27 | | |
28 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/reason-and-plan.jpg" alt="Reason and plan"/> | |
29 | | |
30 | Once he has a plan, he **must act**. To execute his plan, **he can use tools from the list of tools he knows about**. | |
31 | | |
32 | In this case, to make a coffee, he uses a coffee machine. He activates the coffee machine to brew the coffee. | |
33 | | |
34 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/make-coffee.jpg" alt="Make coffee"/> | |
35 | | |
36 | Finally, Alfred brings the freshly brewed coffee to us. | |
37 | | |
38 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/bring-coffee.jpg" alt="Bring coffee"/> | |
39 | | |
40 | And this is what an Agent is: an **AI model capable of reasoning, planning, and interacting with its environment**. | |
41 | | |
42 | We call it Agent because it has _agency_, aka it has the ability to interact with the environment. | |
43 | | |
44 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/process.jpg" alt="Agent process"/> | |
45 | | |
46 | ## Let's go more formal | |
47 | | |
48 | Now that you have the big picture, here’s a more precise definition: | |
49 | | |
50 | > An Agent is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks. | |
51 | | |
52 | Think of the Agent as having two main parts: | |
53 | | |
54 | 1. **The Brain (AI Model)** | |
55 | | |
56 | This is where all the thinking happens. The AI model **handles reasoning and planning**. | |
57 | It decides **which Actions to take based on the situation**. | |
58 | | |
59 | 2. **The Body (Capabilities and Tools)** | |
60 | | |
61 | This part represents **everything the Agent is equipped to do**. | |
62 | | |
63 | The **scope of possible actions** depends on what the agent **has been equipped with**. For example, because humans lack wings, they can't perform the "fly" **Action**, but they can execute **Actions** like "walk", "run" ,"jump", "grab", and so on. | |
64 | | |
65 | ### The spectrum of "Agency" | |
66 | | |
67 | Following this definition, Agents exist on a continuous spectrum of increasing agency: | |
68 | | |
69 | | Agency Level | Description | What that's called | Example pattern | | |
70 | | --- | --- | --- | --- | | |
71 | | ☆☆☆ | Agent output has no impact on program flow | Simple processor | `process_llm_output(llm_response)` | | |
72 | | ★☆☆ | Agent output determines basic control flow | Router | `if llm_decision(): path_a() else: path_b()` | | |
73 | | ★★☆ | Agent output determines function execution | Tool caller | `run_function(llm_chosen_tool, llm_chosen_args)` | | |
74 | | ★★★ | Agent output controls iteration and program continuation | Multi-step Agent | `while llm_should_continue(): execute_next_step()` | | |
75 | | ★★★ | One agentic workflow can start another agentic workflow | Multi-Agent | `if llm_trigger(): execute_agent()` | | |
76 | | |
77 | Table from [smolagents conceptual guide](https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents). | |
78 | | |
79 | | |
80 | ## What type of AI Models do we use for Agents? | |
81 | | |
82 | The most common AI model found in Agents is an LLM (Large Language Model), which takes **Text** as an input and outputs **Text** as well. | |
83 | | |
84 | Well known examples are **GPT4** from **OpenAI**, **LLama** from **Meta**, **Gemini** from **Google**, etc. These models have been trained on a vast amount of text and are able to generalize well. We will learn more about LLMs in the [next section](what-are-llms). | |
85 | | |
86 | <Tip> | |
87 | It's also possible to use models that accept other inputs as the Agent's core model. For example, a Vision Language Model (VLM), which is like an LLM but also understands images as input. We'll focus on LLMs for now and will discuss other options later. | |
88 | </Tip> | |
89 | | |
90 | ## How does an AI take action on its environment? | |
91 | | |
92 | LLMs are amazing models, but **they can only generate text**. | |
93 | | |
94 | However, if you ask a well-known chat application like HuggingChat or ChatGPT to generate an image, they can! How is that possible? | |
95 | | |
96 | The answer is that the developers of HuggingChat, ChatGPT and similar apps implemented additional functionality (called **Tools**), that the LLM can use to create images. | |
97 | | |
98 | <figure> | |
99 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/eiffel_brocolis.jpg" alt="Eiffel Brocolis"/> | |
100 | <figcaption>The model used an Image Generation Tool to generate this image. | |
101 | </figcaption> | |
102 | </figure> | |
103 | | |
104 | We will learn more about tools in the [Tools](tools) section. | |
105 | | |
106 | ## What type of tasks can an Agent do? | |
107 | | |
108 | An Agent can perform any task we implement via **Tools** to complete **Actions**. | |
109 | | |
110 | For example, if I write an Agent to act as my personal assistant (like Siri) on my computer, and I ask it to "send an email to my Manager asking to delay today's meeting", I can give it some code to send emails. This will be a new Tool the Agent can use whenever it needs to send an email. We can write it in Python: | |
111 | | |
112 | ```python | |
113 | def send_message_to(recipient, message): | |
114 | """Useful to send an e-mail message to a recipient""" | |
115 | ... | |
116 | ``` | |
117 | | |
118 | The LLM, as we'll see, will generate code to run the tool when it needs to, and thus fulfill the desired task. | |
119 | | |
120 | ```python | |
121 | send_message_to("Manager", "Can we postpone today's meeting?") | |
122 | ``` | |
123 | | |
124 | The **design of the Tools is very important and has a great impact on the quality of your Agent**. Some tasks will require very specific Tools to be crafted, while others may be solved with general purpose tools like "web_search". | |
125 | | |
126 | > Note that **Actions are not the same as Tools**. An Action, for instance, can involve the use of multiple Tools to complete. | |
127 | | |
128 | Allowing an agent to interact with its environment **allows real-life usage for companies and individuals**. | |
129 | | |
130 | ### Example 1: Personal Virtual Assistants | |
131 | | |
132 | Virtual assistants like Siri, Alexa, or Google Assistant, work as agents when they interact on behalf of users using their digital environments. | |
133 | | |
134 | They take user queries, analyze context, retrieve information from databases, and provide responses or initiate actions (like setting reminders, sending messages, or controlling smart devices). | |
135 | | |
136 | ### Example 2: Customer Service Chatbots | |
137 | | |
138 | Many companies deploy chatbots as agents that interact with customers in natural language. | |
139 | | |
140 | These agents can answer questions, guide users through troubleshooting steps, open issues in internal databases, or even complete transactions. | |
141 | | |
142 | Their predefined objectives might include improving user satisfaction, reducing wait times, or increasing sales conversion rates. By interacting directly with customers, learning from the dialogues, and adapting their responses over time, they demonstrate the core principles of an agent in action. | |
143 | | |
144 | | |
145 | ### Example 3: AI Non-Playable Character in a video game | |
146 | | |
147 | AI agents powered by LLMs can make Non-Playable Characters (NPCs) more dynamic and unpredictable. | |
148 | | |
149 | Instead of following rigid behavior trees, they can **respond contextually, adapt to player interactions**, and generate more nuanced dialogue. This flexibility helps create more lifelike, engaging characters that evolve alongside the player’s actions. | |
150 | | |
151 | --- | |
152 | | |
153 | To summarize, an Agent is a system that uses an AI Model (typically an LLM) as its core reasoning engine, to: | |
154 | | |
155 | - **Understand natural language:** Interpret and respond to human instructions in a meaningful way. | |
156 | | |
157 | - **Reason and plan:** Analyze information, make decisions, and devise strategies to solve problems. | |
158 | | |
159 | - **Interact with its environment:** Gather information, take actions, and observe the results of those actions. | |
160 | | |
161 | Now that you have a solid grasp of what Agents are, let’s reinforce your understanding with a short, ungraded quiz. After that, we’ll dive into the “Agent’s brain”: the [LLMs](what-are-llms). | |
162 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/introduction.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Introduction to Agentic Frameworks | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/thumbnail.jpg" alt="Thumbnail"/> | |
4 | | |
5 | Welcome to this second unit, where **we'll explore different agentic frameworks** that can be used to build powerful agentic applications. | |
6 | | |
7 | We will study: | |
8 | | |
9 | - In Unit 2.1: [smolagents](https://huggingface.co/docs/smolagents/en/index) | |
10 | - In Unit 2.2: [LlamaIndex](https://www.llamaindex.ai/) | |
11 | - In Unit 2.3: [LangGraph](https://www.langchain.com/langgraph) | |
12 | | |
13 | Let's dive in! 🕵 | |
14 | | |
15 | ## When to Use an Agentic Framework | |
16 | | |
17 | An agentic framework is **not always needed when building an application around LLMs**. They provide flexibility in the workflow to efficiently solve a specific task, but they're not always necessary. | |
18 | | |
19 | Sometimes, **predefined workflows are sufficient** to fulfill user requests, and there is no real need for an agentic framework. If the approach to build an agent is simple, like a chain of prompts, using plain code may be enough. The advantage is that the developer will have **full control and understanding of their system without abstractions**. | |
20 | | |
21 | However, when the workflow becomes more complex, such as letting an LLM call functions or using multiple agents, these abstractions start to become helpful. | |
22 | | |
23 | Considering these ideas, we can already identify the need for some features: | |
24 | | |
25 | * An *LLM engine* that powers the system. | |
26 | * A *list of tools* the agent can access. | |
27 | * A *parser* for extracting tool calls from the LLM output. | |
28 | * A *system prompt* synced with the parser. | |
29 | * A *memory system*. | |
30 | * *Error logging and retry mechanisms* to control LLM mistakes. | |
31 | We'll explore how these topics are resolved in various frameworks, including `smolagents`, `LlamaIndex`, and `LangGraph`. | |
32 | | |
33 | ## Agentic Frameworks Units | |
34 | | |
35 | | Framework | Description | Unit Author | | |
36 | |------------|----------------|----------------| | |
37 | | [smolagents](./smolagents/introduction) | Agents framework developed by Hugging Face. | Sergio Paniego - [HF](https://huggingface.co/sergiopaniego) - [X](https://x.com/sergiopaniego) - [Linkedin](https://www.linkedin.com/in/sergio-paniego-blanco) | | |
38 | | [Llama-Index](./llama-index/introduction) |End-to-end tooling to ship a context-augmented AI agent to production | David Berenstein - [HF](https://huggingface.co/davidberenstein1957) - [X](https://x.com/davidberenstei) - [Linkedin](https://www.linkedin.com/in/davidberenstein) | | |
39 | | [LangGraph](./langgraph/introduction) | Agents allowing stateful orchestration of agents | Joffrey THOMAS - [HF](https://huggingface.co/Jofthomas) - [X](https://x.com/Jthmas404) - [Linkedin](https://www.linkedin.com/in/joffrey-thomas) | | |
40 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/langgraph/building_blocks.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Building Blocks of LangGraph | |
2 | | |
3 | To build applications with LangGraph, you need to understand its core components. Let's explore the fundamental building blocks that make up a LangGraph application. | |
4 | | |
5 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Building_blocks.png" alt="Building Blocks" width="70%"/> | |
6 | | |
7 | An application in LangGraph starts from an **entrypoint**, and depending on the execution, the flow may go to one function or another until it reaches the END. | |
8 | | |
9 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/application.png" alt="Application"/> | |
10 | | |
11 | ## 1. State | |
12 | | |
13 | **State** is the central concept in LangGraph. It represents all the information that flows through your application. | |
14 | | |
15 | ```python | |
16 | from typing_extensions import TypedDict | |
17 | | |
18 | class State(TypedDict): | |
19 | graph_state: str | |
20 | ``` | |
21 | | |
22 | The state is **User defined**, hence the fields should carefully be crafted to contain all data needed for decision-making process! | |
23 | | |
24 | > 💡 **Tip:** Think carefully about what information your application needs to track between steps. | |
25 | | |
26 | ## 2. Nodes | |
27 | | |
28 | **Nodes** are python functions. Each node: | |
29 | - Takes the state as input | |
30 | - Performs some operation | |
31 | - Returns updates to the state | |
32 | | |
33 | ```python | |
34 | def node_1(state): | |
35 | print("---Node 1---") | |
36 | return {"graph_state": state['graph_state'] +" I am"} | |
37 | | |
38 | def node_2(state): | |
39 | print("---Node 2---") | |
40 | return {"graph_state": state['graph_state'] +" happy!"} | |
41 | | |
42 | def node_3(state): | |
43 | print("---Node 3---") | |
44 | return {"graph_state": state['graph_state'] +" sad!"} | |
45 | ``` | |
46 | | |
47 | For example, Nodes can contain: | |
48 | - **LLM calls**: Generate text or make decisions | |
49 | - **Tool calls**: Interact with external systems | |
50 | - **Conditional logic**: Determine next steps | |
51 | - **Human intervention**: Get input from users | |
52 | | |
53 | > 💡 **Info:** Some nodes necessary for the whole workflow like START and END exist from langGraph directly. | |
54 | | |
55 | | |
56 | ## 3. Edges | |
57 | | |
58 | **Edges** connect nodes and define the possible paths through your graph: | |
59 | | |
60 | ```python | |
61 | import random | |
62 | from typing import Literal | |
63 | | |
64 | def decide_mood(state) -> Literal["node_2", "node_3"]: | |
65 | | |
66 | # Often, we will use state to decide on the next node to visit | |
67 | user_input = state['graph_state'] | |
68 | | |
69 | # Here, let's just do a 50 / 50 split between nodes 2, 3 | |
70 | if random.random() < 0.5: | |
71 | | |
72 | # 50% of the time, we return Node 2 | |
73 | return "node_2" | |
74 | | |
75 | # 50% of the time, we return Node 3 | |
76 | return "node_3" | |
77 | ``` | |
78 | | |
79 | Edges can be: | |
80 | - **Direct**: Always go from node A to node B | |
81 | - **Conditional**: Choose the next node based on the current state | |
82 | | |
83 | ## 4. StateGraph | |
84 | | |
85 | The **StateGraph** is the container that holds your entire agent workflow: | |
86 | | |
87 | ```python | |
88 | from IPython.display import Image, display | |
89 | from langgraph.graph import StateGraph, START, END | |
90 | | |
91 | # Build graph | |
92 | builder = StateGraph(State) | |
93 | builder.add_node("node_1", node_1) | |
94 | builder.add_node("node_2", node_2) | |
95 | builder.add_node("node_3", node_3) | |
96 | | |
97 | # Logic | |
98 | builder.add_edge(START, "node_1") | |
99 | builder.add_conditional_edges("node_1", decide_mood) | |
100 | builder.add_edge("node_2", END) | |
101 | builder.add_edge("node_3", END) | |
102 | | |
103 | # Add | |
104 | graph = builder.compile() | |
105 | ``` | |
106 | | |
107 | Which can then be visualized! | |
108 | ```python | |
109 | # View | |
110 | display(Image(graph.get_graph().draw_mermaid_png())) | |
111 | ``` | |
112 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/basic_graph.jpeg" alt="Graph Visualization"/> | |
113 | | |
114 | But most importantly, invoked: | |
115 | ```python | |
116 | graph.invoke({"graph_state" : "Hi, this is Lance."}) | |
117 | ``` | |
118 | output : | |
119 | ``` | |
120 | ---Node 1--- | |
121 | ---Node 3--- | |
122 | {'graph_state': 'Hi, this is Lance. I am sad!'} | |
123 | ``` | |
124 | | |
125 | ## What's Next? | |
126 | | |
127 | In the next section, we'll put these concepts into practice by building our first graph. This graph lets Alfred take in your e-mails, classify them, and craft a preliminary answer if they are genuine. | |
128 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/langgraph/conclusion.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Conclusion | |
2 | | |
3 | Congratulations on finishing the `LangGraph` module of this second Unit! 🥳 | |
4 | | |
5 | You've now mastered the fundamentals of building structured workflows with LangGraph which you will be able to send to production. | |
6 | | |
7 | This module is just the beginning of your journey with LangGraph. For more advanced topics, we recommend: | |
8 | | |
9 | - Exploring the [official LangGraph documentation](https://github.com/langchain-ai/langgraph) | |
10 | - Taking the comprehensive [Introduction to LangGraph](https://academy.langchain.com/courses/intro-to-langgraph) course from LangChain Academy | |
11 | - Build something yourself ! | |
12 | | |
13 | In the next Unit, you'll now explore real use cases. It's time to leave theory to get into real action ! | |
14 | | |
15 | We would greatly appreciate **your thoughts on the course and suggestions for improvement**. If you have feedback, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog) | |
16 | | |
17 | ### Keep Learning, Stay Awesome! 🤗 | |
18 | | |
19 | Good Sir/Madam! 🎩🦇 | |
20 | | |
21 | -Alfred- | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/langgraph/document_analysis_agent.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Document Analysis Graph | |
2 | | |
3 | Alfred at your service. As Mr. Wayne's trusted butler, I've taken the liberty of documenting how I assist Mr Wayne with his various documentary needs. While he's out attending to his... nighttime activities, I ensure all his paperwork, training schedules, and nutritional plans are properly analyzed and organized. | |
4 | | |
5 | Before leaving, he left a note with his week training program. I then took the responsibility to come up with a **menu** for tomorrow's meals. | |
6 | | |
7 | For future such event, let's create a document analysis system using LangGraph to serve Mister Wayne's needs. This system can: | |
8 | | |
9 | 1. Process images document | |
10 | 2. Extract text using vision models (Vision Language Model) | |
11 | 3. Perform calculations when needed (to demonstrate normal tools) | |
12 | 4. Analyze content and provide concise summaries | |
13 | 5. Execute specific instructions related to documents | |
14 | | |
15 | ## The Butler's Workflow | |
16 | | |
17 | The workflow we'll build, follows a structured this schema: | |
18 | | |
19 |  | |
20 | | |
21 | <Tip> | |
22 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/langgraph/agent.ipynb" target="_blank">this notebook</a> that you can run using Google Colab. | |
23 | </Tip> | |
24 | | |
25 | ## Setting Up the environment | |
26 | | |
27 | ```python | |
28 | %pip install langgraph langchain_openai langchain_core | |
29 | ``` | |
30 | and imports : | |
31 | ```python | |
32 | import base64 | |
33 | from typing import List, TypedDict, Annotated, Optional | |
34 | from langchain_openai import ChatOpenAI | |
35 | from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage | |
36 | from langgraph.graph.message import add_messages | |
37 | from langgraph.graph import START, StateGraph | |
38 | from langgraph.prebuilt import ToolNode, tools_condition | |
39 | from IPython.display import Image, display | |
40 | ``` | |
41 | | |
42 | ## Defining Agent's State | |
43 | | |
44 | This state is a little more complex than the previous ones we have seen. | |
45 | AnyMessage is a class from langchain that define messages and the add_messages is an operator that add the latest message rather than overwritting it with the latest state. | |
46 | | |
47 | This is a new concept in LangGraph, where you can add operators in your state to define the way they should interact together. | |
48 | | |
49 | ```python | |
50 | class AgentState(TypedDict): | |
51 | # The document provided | |
52 | input_file: Optional[str] # Contains file path (PDF/PNG) | |
53 | messages: Annotated[list[AnyMessage], add_messages] | |
54 | ``` | |
55 | | |
56 | ## Preparing Tools | |
57 | | |
58 | ```python | |
59 | vision_llm = ChatOpenAI(model="gpt-4o") | |
60 | | |
61 | def extract_text(img_path: str) -> str: | |
62 | """ | |
63 | Extract text from an image file using a multimodal model. | |
64 | | |
65 | Master Wayne often leaves notes with his training regimen or meal plans. | |
66 | This allows me to properly analyze the contents. | |
67 | """ | |
68 | all_text = "" | |
69 | try: | |
70 | # Read image and encode as base64 | |
71 | with open(img_path, "rb") as image_file: | |
72 | image_bytes = image_file.read() | |
73 | | |
74 | image_base64 = base64.b64encode(image_bytes).decode("utf-8") | |
75 | | |
76 | # Prepare the prompt including the base64 image data | |
77 | message = [ | |
78 | HumanMessage( | |
79 | content=[ | |
80 | { | |
81 | "type": "text", | |
82 | "text": ( | |
83 | "Extract all the text from this image. " | |
84 | "Return only the extracted text, no explanations." | |
85 | ), | |
86 | }, | |
87 | { | |
88 | "type": "image_url", | |
89 | "image_url": { | |
90 | "url": f"data:image/png;base64,{image_base64}" | |
91 | }, | |
92 | }, | |
93 | ] | |
94 | ) | |
95 | ] | |
96 | | |
97 | # Call the vision-capable model | |
98 | response = vision_llm.invoke(message) | |
99 | | |
100 | # Append extracted text | |
101 | all_text += response.content + "\n\n" | |
102 | | |
103 | return all_text.strip() | |
104 | except Exception as e: | |
105 | # A butler should handle errors gracefully | |
106 | error_msg = f"Error extracting text: {str(e)}" | |
107 | print(error_msg) | |
108 | return "" | |
109 | | |
110 | def divide(a: int, b: int) -> float: | |
111 | """Divide a and b - for Master Wayne's occasional calculations.""" | |
112 | return a / b | |
113 | | |
114 | # Equip the butler with tools | |
115 | tools = [ | |
116 | divide, | |
117 | extract_text | |
118 | ] | |
119 | | |
120 | llm = ChatOpenAI(model="gpt-4o") | |
121 | llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=False) | |
122 | ``` | |
123 | | |
124 | ## The nodes | |
125 | | |
126 | ```python | |
127 | def assistant(state: AgentState): | |
128 | # System message | |
129 | textual_description_of_tool=""" | |
130 | extract_text(img_path: str) -> str: | |
131 | Extract text from an image file using a multimodal model. | |
132 | | |
133 | Args: | |
134 | img_path: A local image file path (strings). | |
135 | | |
136 | Returns: | |
137 | A single string containing the concatenated text extracted from each image. | |
138 | divide(a: int, b: int) -> float: | |
139 | Divide a and b | |
140 | """ | |
141 | image=state["input_file"] | |
142 | sys_msg = SystemMessage(content=f"You are an helpful butler named Alfred that serves Mr. Wayne and Batman. You can analyse documents and run computations with provided tools:\n{textual_description_of_tool} \n You have access to some optional images. Currently the loaded image is: {image}") | |
143 | | |
144 | return { | |
145 | "messages": [llm_with_tools.invoke([sys_msg] + state["messages"])], | |
146 | "input_file": state["input_file"] | |
147 | } | |
148 | ``` | |
149 | | |
150 | ## The ReAct Pattern: How I Assist Mr. Wayne | |
151 | | |
152 | Allow me to explain the approach in this agent. The agent follows what's known as the ReAct pattern (Reason-Act-Observe) | |
153 | | |
154 | 1. **Reason** about his documents and requests | |
155 | 2. **Act** by using appropriate tools | |
156 | 3. **Observe** the results | |
157 | 4. **Repeat** as necessary until I've fully addressed his needs | |
158 | | |
159 | This is a simple implementation of an agent using langGraph. | |
160 | | |
161 | ```python | |
162 | # The graph | |
163 | builder = StateGraph(AgentState) | |
164 | | |
165 | # Define nodes: these do the work | |
166 | builder.add_node("assistant", assistant) | |
167 | builder.add_node("tools", ToolNode(tools)) | |
168 | | |
169 | # Define edges: these determine how the control flow moves | |
170 | builder.add_edge(START, "assistant") | |
171 | builder.add_conditional_edges( | |
172 | "assistant", | |
173 | # If the latest message requires a tool, route to tools | |
174 | # Otherwise, provide a direct response | |
175 | tools_condition, | |
176 | ) | |
177 | builder.add_edge("tools", "assistant") | |
178 | react_graph = builder.compile() | |
179 | | |
180 | # Show the butler's thought process | |
181 | display(Image(react_graph.get_graph(xray=True).draw_mermaid_png())) | |
182 | ``` | |
183 | | |
184 | We define a `tools` node with our list of tools. The `assistant` node is just our model with bound tools. | |
185 | We create a graph with `assistant` and `tools` nodes. | |
186 | | |
187 | We add `tools_condition` edge, which routes to `End` or to `tools` based on whether the `assistant` calls a tool. | |
188 | | |
189 | Now, we add one new step: | |
190 | | |
191 | We connect the `tools` node back to the `assistant`, forming a loop. | |
192 | | |
193 | - After the `assistant` node executes, `tools_condition` checks if the model's output is a tool call. | |
194 | - If it is a tool call, the flow is directed to the `tools` node. | |
195 | - The `tools` node connects back to `assistant`. | |
196 | - This loop continues as long as the model decides to call tools. | |
197 | - If the model response is not a tool call, the flow is directed to END, terminating the process. | |
198 | | |
199 |  | |
200 | | |
201 | ## The Butler in Action | |
202 | | |
203 | ### Example 1: Simple Calculations | |
204 | | |
205 | Here is an example to show a simple use case of an agent using a tool in LangGraph. | |
206 | | |
207 | ```python | |
208 | messages = [HumanMessage(content="Divide 6790 by 5")] | |
209 | messages = react_graph.invoke({"messages": messages, "input_file": None}) | |
210 | | |
211 | # Show the messages | |
212 | for m in messages['messages']: | |
213 | m.pretty_print() | |
214 | ``` | |
215 | | |
216 | The conversation would proceed: | |
217 | | |
218 | ``` | |
219 | Human: Divide 6790 by 5 | |
220 | | |
221 | AI Tool Call: divide(a=6790, b=5) | |
222 | | |
223 | Tool Response: 1358.0 | |
224 | | |
225 | Alfred: The result of dividing 6790 by 5 is 1358.0. | |
226 | ``` | |
227 | | |
228 | ### Example 2: Analyzing Master Wayne's Training Documents | |
229 | | |
230 | When Master Wayne leaves his training and meal notes: | |
231 | | |
232 | ```python | |
233 | messages = [HumanMessage(content="According to the note provided by Mr. Wayne in the provided images. What's the list of items I should buy for the dinner menu?")] | |
234 | messages = react_graph.invoke({"messages": messages, "input_file": "Batman_training_and_meals.png"}) | |
235 | ``` | |
236 | | |
237 | The interaction would proceed: | |
238 | | |
239 | ``` | |
240 | Human: According to the note provided by Mr. Wayne in the provided images. What's the list of items I should buy for the dinner menu? | |
241 | | |
242 | AI Tool Call: extract_text(img_path="Batman_training_and_meals.png") | |
243 | | |
244 | Tool Response: [Extracted text with training schedule and menu details] | |
245 | | |
246 | Alfred: For the dinner menu, you should buy the following items: | |
247 | | |
248 | 1. Grass-fed local sirloin steak | |
249 | 2. Organic spinach | |
250 | 3. Piquillo peppers | |
251 | 4. Potatoes (for oven-baked golden herb potato) | |
252 | 5. Fish oil (2 grams) | |
253 | | |
254 | Ensure the steak is grass-fed and the spinach and peppers are organic for the best quality meal. | |
255 | ``` | |
256 | | |
257 | ## Key Takeaways | |
258 | | |
259 | Should you wish to create your own document analysis butler, here are key considerations: | |
260 | | |
261 | 1. **Define clear tools** for specific document-related tasks | |
262 | 2. **Create a robust state tracker** to maintain context between tool calls | |
263 | 3. **Consider error handling** for tools fails | |
264 | 5. **Maintain contextual awareness** of previous interactions (ensured by the operator add_messages) | |
265 | | |
266 | With these principles, you too can provide exemplary document analysis service worthy of Wayne Manor. | |
267 | | |
268 | *I trust this explanation has been satisfactory. Now, if you'll excuse me, Master Wayne's cape requires pressing before tonight's activities.* | |
269 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/langgraph/introduction.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Introduction to `LangGraph` | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/LangGraph.png" alt="Unit 2.3 Thumbnail"/> | |
4 | | |
5 | Welcome to this next part of our journey, where you'll learn **how to build applications** using the [`LangGraph`](https://github.com/langchain-ai/langgraph) framework designed to help you structure and orchestrate complex LLM workflows. | |
6 | | |
7 | `LangGraph` is a framework that allows you to build **production-ready** applications by giving you **control** tools over the flow of your agent. | |
8 | | |
9 | ## Module Overview | |
10 | | |
11 | In this unit, you'll discover: | |
12 | | |
13 | ### 1️⃣ [What is LangGraph, and when to use it?](./when_to_use_langgraph) | |
14 | ### 2️⃣ [Building Blocks of LangGraph](./building_blocks) | |
15 | ### 3️⃣ [Alfred, the mail sorting butler](./first_graph) | |
16 | ### 4️⃣ [Alfred, the document Analyst agent](./document_analysis_agent) | |
17 | ### 5️⃣ [Quiz](./quizz1) | |
18 | | |
19 | <Tip warning={true}> | |
20 | The examples in this section require access to a powerful LLM/VLM model. We ran them using the GPT-4o API because it has the best compatibility with langGraph. | |
21 | </Tip> | |
22 | | |
23 | By the end of this unit, you'll be equipped to build robust, organized and production ready applications ! | |
24 | | |
25 | That being said, this section is an introduction to langGraph and more advances topics can be discovered in the free langChain academy course : [Introduction to LangGraph](https://academy.langchain.com/courses/intro-to-langgraph) | |
26 | | |
27 | Let's get started! | |
28 | | |
29 | ## Resources | |
30 | | |
31 | - [LangGraph Agents](https://langchain-ai.github.io/langgraph/) - Examples of LangGraph agent | |
32 | - [LangChain academy](https://academy.langchain.com/courses/intro-to-langgraph) - Full course on LangGraph from LangChain | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/langgraph/quiz1.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Test Your Understanding of LangGraph | |
2 | | |
3 | Let's test your understanding of `LangGraph` with a quick quiz! This will help reinforce the key concepts we've covered so far. | |
4 | | |
5 | This is an optional quiz and it's not graded. | |
6 | | |
7 | ### Q1: What is the primary purpose of LangGraph? | |
8 | Which statement best describes what LangGraph is designed for? | |
9 | | |
10 | <Question | |
11 | choices={[ | |
12 | { | |
13 | text: "A framework to build control flows for applications containing LLMs", | |
14 | explain: "Correct! LangGraph is specifically designed to help build and manage the control flow of applications that use LLMs.", | |
15 | correct: true | |
16 | }, | |
17 | { | |
18 | text: "A library that provides interfaces to interact with different LLM models", | |
19 | explain: "This better describes LangChain's role, which provides standard interfaces for model interaction. LangGraph focuses on control flow.", | |
20 | }, | |
21 | { | |
22 | text: "An Agent library for tool calling", | |
23 | explain: "While LangGraph works with agents, the main purpose of langGraph is 'Ochestration'.", | |
24 | } | |
25 | ]} | |
26 | /> | |
27 | | |
28 | --- | |
29 | | |
30 | ### Q2: In the context of the "Control vs Freedom" trade-off, where does LangGraph stand? | |
31 | Which statement best characterizes LangGraph's approach to agent design? | |
32 | | |
33 | <Question | |
34 | choices={[ | |
35 | { | |
36 | text: "LangGraph maximizes freedom, allowing LLMs to make all decisions independently", | |
37 | explain: "LangGraph actually focuses more on control than freedom, providing structure for LLM workflows.", | |
38 | }, | |
39 | { | |
40 | text: "LangGraph provides strong control over execution flow while still leveraging LLM capabilities for decision making", | |
41 | explain: "Correct! LangGraph shines when you need control over your agent's execution, providing predictable behavior through structured workflows.", | |
42 | correct: true | |
43 | }, | |
44 | ]} | |
45 | /> | |
46 | | |
47 | --- | |
48 | | |
49 | ### Q3: What role does State play in LangGraph? | |
50 | Choose the most accurate description of State in LangGraph. | |
51 | | |
52 | <Question | |
53 | choices={[ | |
54 | { | |
55 | text: "State is the latest generation from the LLM", | |
56 | explain: "State is a user-defined class in LangGraph, not LLM generated. It's fields are user defined, the values can be LLM filled", | |
57 | }, | |
58 | { | |
59 | text: "State is only used to track errors during execution", | |
60 | explain: "State has a much broader purpose than just error tracking. But that's still usefull.", | |
61 | }, | |
62 | { | |
63 | text: "State represents the information that flows through your agent application", | |
64 | explain: "Correct! State is central to LangGraph and contains all the information needed for decision-making between steps. You provide the fields than you need to compute and the nodes can alter the values to decide on a branching.", | |
65 | correct: true | |
66 | }, | |
67 | { | |
68 | text: "State is only relevant when working with external APIs", | |
69 | explain: "State is fundamental to all LangGraph applications, not just those working with external APIs.", | |
70 | } | |
71 | ]} | |
72 | /> | |
73 | | |
74 | ### Q4: What is a Conditional Edge in LangGraph? | |
75 | Select the most accurate description. | |
76 | | |
77 | <Question | |
78 | choices={[ | |
79 | { | |
80 | text: "An edge that determines which node to execute next based on evaluating a condition", | |
81 | explain: "Correct! Conditional edges allow your graph to make dynamic routing decisions based on the current state, creating branching logic in your workflow.", | |
82 | correct: true | |
83 | }, | |
84 | { | |
85 | text: "An edge that is only followed when a specific condition occurs", | |
86 | explain: "Conditional edges control the flow of the application on it's outputs, not on the input.", | |
87 | }, | |
88 | { | |
89 | text: "An edge that requires user confirmation before proceeding", | |
90 | explain: "Conditional edges are based on programmatic conditions, not user interaction requirements.", | |
91 | } | |
92 | ]} | |
93 | /> | |
94 | | |
95 | --- | |
96 | | |
97 | ### Q5: How does LangGraph help address the hallucination problem in LLMs? | |
98 | Choose the best answer. | |
99 | | |
100 | <Question | |
101 | choices={[ | |
102 | { | |
103 | text: "LangGraph eliminates hallucinations entirely by limiting LLM responses", | |
104 | explain: "No framework can completely eliminate hallucinations from LLMs, LangGraph is no exception.", | |
105 | }, | |
106 | { | |
107 | text: "LangGraph provides structured workflows that can validate and verify LLM outputs", | |
108 | explain: "Correct! By creating structured workflows with validation steps, verification nodes, and error handling paths, LangGraph helps reduce the impact of hallucinations.", | |
109 | correct: true | |
110 | }, | |
111 | { | |
112 | text: "LangGraph has no effect on hallucinations", | |
113 | explain: "LangGraph's structured approach to workflows can help significantly in mitigating hallucinations at the cost of speed.", | |
114 | } | |
115 | ]} | |
116 | /> | |
117 | | |
118 | Congratulations on completing the quiz! 🎉 If you missed any questions, consider reviewing the previous sections to strengthen your understanding. Next, we'll explore more advanced features of LangGraph and see how to build more complex agent workflows. | |
119 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/langgraph/when_to_use_langgraph.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # What is `LangGraph`? | |
2 | | |
3 | `LangGraph` is a framework developed by [LangChain](https://www.langchain.com/) **to manage the control flow of applications that integrate an LLM**. | |
4 | | |
5 | ## Is `LangGraph` different from `LangChain`? | |
6 | | |
7 | LangChain provides a standard interface to interact with models and other components, useful for retrieval, LLM calls and tools calls. | |
8 | The classes from LangChain might be used in LangGraph, but do not HAVE to be used. | |
9 | | |
10 | The packages are different and can be used in isolation, but, in the end, all resources you will find online use both packages hand in hand. | |
11 | | |
12 | ## When should I use `LangGraph`? | |
13 | ### Control vs freedom | |
14 | | |
15 | When designing AI applications, you face a fundamental trade-off between **control** and **freedom**: | |
16 | | |
17 | - **Freedom** gives your LLM more room to be creative and tackle unexpected problems. | |
18 | - **Control** allows you to ensure predictable behavior and maintain guardrails. | |
19 | | |
20 | Code Agents, like the ones you can encounter in *smolagents*, are very free. They can call multiple tools in a single action step, create their own tools, etc. However, this behavior can make them less predictable and less controllable than a regular Agent working with JSON! | |
21 | | |
22 | `LangGraph` is on the other end of the spectrum, it shines when you need **"Control"** on the execution of your agent. | |
23 | | |
24 | LangGraph is particularly valuable when you need **Control over your applications**. It gives you the tools to build an application that follows a predictable process while still leveraging the power of LLMs. | |
25 | | |
26 | Put simply, if your application involves a series of steps that need to be orchestrated in a specific way, with decisions being made at each junction point, **LangGraph provides the structure you need**. | |
27 | | |
28 | As an example, let's say we want to build an LLM assistant that can answer some questions over some documents. | |
29 | | |
30 | Since LLMs understand text the best, before being able to answer the question, you will need to convert other complex modalities (charts, tables) into text. However, that choice depends on the type of document you have! | |
31 | | |
32 | This is a branching that I chose to represent as follow : | |
33 | | |
34 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/flow.png" alt="Control flow"/> | |
35 | | |
36 | > 💡 **Tip:** The left part is not an agent, as here no tool call is involved. but the right part will need to write some code to query the xls ( convert to pandas and manipulate it ). | |
37 | | |
38 | While this branching is deterministic, you can also design branching that are conditioned on the output of an LLM making them undeterministic. | |
39 | | |
40 | The key scenarios where LangGraph excels include: | |
41 | | |
42 | - **Multi-step reasoning processes** that need explicit control on the flow | |
43 | - **Applications requiring persistence of state** between steps | |
44 | - **Systems that combine deterministic logic with AI capabilities** | |
45 | - **Workflows that need human-in-the-loop interventions** | |
46 | - **Complex agent architectures** with multiple components working together | |
47 | | |
48 | In essence, whenever possible, **as a human**, design a flow of actions based on the output of each action, and decide what to execute next accordingly. In this case, LangGraph is the correct framework for you! | |
49 | | |
50 | `LangGraph` is, in my opinion, the most production-ready agent framework on the market. | |
51 | | |
52 | ## How does LangGraph work? | |
53 | | |
54 | At its core, `LangGraph` uses a directed graph structure to define the flow of your application: | |
55 | | |
56 | - **Nodes** represent individual processing steps (like calling an LLM, using a tool, or making a decision). | |
57 | - **Edges** define the possible transitions between steps. | |
58 | - **State** is user defined and maintained and passed between nodes during execution. When deciding which node to target next, this is the current state that we look at. | |
59 | | |
60 | We will explore those fundamental blocks more in the next chapter! | |
61 | | |
62 | ## How is it different from regular python? Why do I need LangGraph? | |
63 | | |
64 | You might wonder: "I could just write regular Python code with if-else statements to handle all these flows, right?" | |
65 | | |
66 | While technically true, LangGraph offers **some advantages** over vanilla Python for building complex systems. You could build the same application without LangGraph, but it builds easier tools and abstractions for you. | |
67 | | |
68 | It includes states, visualization, logging (traces), built-in human-in-the-loop, and more. | |
69 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/README.md: | |
-------------------------------------------------------------------------------- | |
1 | # Table of Contents | |
2 | | |
3 | This LlamaIndex frame outline is part of unit 2 of the course. You can access the unit 2 about LlamaIndex on hf.co/learn 👉 <a href="https://hf.co/learn/agents-course/unit2/llama-index/introduction">here</a> | |
4 | | |
5 | | Title | Description | | |
6 | | --- | --- | | |
7 | | [Introduction](introduction.mdx) | Introduction to LlamaIndex | | |
8 | | [LlamaHub](llama-hub.mdx) | LlamaHub: a registry of integrations, agents and tools | | |
9 | | [Components](components.mdx) | Components: the building blocks of workflows | | |
10 | | [Tools](tools.mdx) | Tools: how to build tools in LlamaIndex | | |
11 | | [Quiz 1](quiz1.mdx) | Quiz 1 | | |
12 | | [Agents](agents.mdx) | Agents: how to build agents in LlamaIndex | | |
13 | | [Workflows](workflows.mdx) | Workflows: a sequence of steps, events made of components that are executed in order | | |
14 | | [Quiz 2](quiz2.mdx) | Quiz 2 | | |
15 | | [Conclusion](conclusion.mdx) | Conclusion | | |
16 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/agents.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Using Agents in LlamaIndex | |
2 | | |
3 | Remember Alfred, our helpful butler agent from earlier? Well, he's about to get an upgrade! | |
4 | Now that we understand the tools available in LlamaIndex, we can give Alfred new capabilities to serve us better. | |
5 | | |
6 | But before we continue, let's remind ourselves what makes an agent like Alfred tick. | |
7 | Back in Unit 1, we learned that: | |
8 | | |
9 | > An Agent is a system that leverages an AI model to interact with its environment to achieve a user-defined objective. It combines reasoning, planning, and action execution (often via external tools) to fulfil tasks. | |
10 | | |
11 | LlamaIndex supports **three main types of reasoning agents:** | |
12 | | |
13 |  | |
14 | | |
15 | 1. `Function Calling Agents` - These work with AI models that can call specific functions. | |
16 | 2. `ReAct Agents` - These can work with any AI that does chat or text endpoint and deal with complex reasoning tasks. | |
17 | 3. `Advanced Custom Agents` - These use more complex methods to deal with more complex tasks and workflows. | |
18 | | |
19 | <Tip>Find more information on advanced agents on <a href="https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/agent/workflow/base_agent.py">BaseWorkflowAgent</a></Tip> | |
20 | | |
21 | ## Initialising Agents | |
22 | | |
23 | <Tip> | |
24 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab. | |
25 | </Tip> | |
26 | | |
27 | To create an agent, we start by providing it with a **set of functions/tools that define its capabilities**. | |
28 | Let's look at how to create an agent with some basic tools. As of this writing, the agent will automatically use the function calling API (if available), or a standard ReAct agent loop. | |
29 | | |
30 | LLMs that support a tools/functions API are relatively new, but they provide a powerful way to call tools by avoiding specific prompting and allowing the LLM to create tool calls based on provided schemas. | |
31 | | |
32 | ReAct agents are also good at complex reasoning tasks and can work with any LLM that has chat or text completion capabilities. They are more verbose, and show the reasoning behind certain actions that they take. | |
33 | | |
34 | ```python | |
35 | from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI | |
36 | from llama_index.core.agent.workflow import AgentWorkflow | |
37 | from llama_index.core.tools import FunctionTool | |
38 | | |
39 | # define sample Tool -- type annotations, function names, and docstrings, are all included in parsed schemas! | |
40 | def multiply(a: int, b: int) -> int: | |
41 | """Multiplies two integers and returns the resulting integer""" | |
42 | return a * b | |
43 | | |
44 | # initialize llm | |
45 | llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct") | |
46 | | |
47 | # initialize agent | |
48 | agent = AgentWorkflow.from_tools_or_functions( | |
49 | [FunctionTool.from_defaults(multiply)], | |
50 | llm=llm | |
51 | ) | |
52 | ``` | |
53 | | |
54 | **Agents are stateless by default**, add remembering past interactions is opt-in using a `Context` object | |
55 | This might be useful if you want to use an agent that needs to remember previous interactions, like a chatbot that maintains context across multiple messages or a task manager that needs to track progress over time. | |
56 | | |
57 | ```python | |
58 | # stateless | |
59 | response = await agent.run("What is 2 times 2?") | |
60 | | |
61 | # remembering state | |
62 | from llama_index.core.workflow import Context | |
63 | | |
64 | ctx = Context(agent) | |
65 | | |
66 | response = await agent.run("My name is Bob.", ctx=ctx) | |
67 | response = await agent.run("What was my name again?", ctx=ctx) | |
68 | ``` | |
69 | | |
70 | You'll notice that agents in `LlamaIndex` are async because they use Python's `await` operator. If you are new to async code in Python, or need a refresher, they have an [excellent async guide](https://docs.llamaindex.ai/en/stable/getting_started/async_python/). | |
71 | | |
72 | Now we've gotten the basics, let's take a look at how we can use more complex tools in our agents. | |
73 | | |
74 | ## Creating RAG Agents with QueryEngineTools | |
75 | | |
76 | **Agentic RAG is a powerful way to use agents to answer questions about your data.** We can pass various tools to Alfred to help him answer questions. | |
77 | However, instead of answering the question on top of documents automatically, Alfred can decide to use any other tool or flow to answer the question. | |
78 | | |
79 |  | |
80 | | |
81 | It is easy to **wrap `QueryEngine` as a tool** for an agent. | |
82 | When doing so, we need to **define a name and description**. The LLM will use this information to correctly use the tool. | |
83 | Let's see how to load in a `QueryEngineTool` using the `QueryEngine` we created in the [component section](components). | |
84 | | |
85 | ```python | |
86 | from llama_index.core.tools import QueryEngineTool | |
87 | | |
88 | query_engine = index.as_query_engine(llm=llm, similarity_top_k=3) # as shown in the Components in LlamaIndex section | |
89 | | |
90 | query_engine_tool = QueryEngineTool.from_defaults( | |
91 | query_engine=query_engine, | |
92 | name="name", | |
93 | description="a specific description", | |
94 | return_direct=False, | |
95 | ) | |
96 | query_engine_agent = AgentWorkflow.from_tools_or_functions( | |
97 | [query_engine_tool], | |
98 | llm=llm, | |
99 | system_prompt="You are a helpful assistant that has access to a database containing persona descriptions. " | |
100 | ) | |
101 | ``` | |
102 | | |
103 | ## Creating Multi-agent systems | |
104 | | |
105 | The `AgentWorkflow` class also directly supports multi-agent systems. By giving each agent a name and description, the system maintains a single active speaker, with each agent having the ability to hand off to another agent. | |
106 | | |
107 | By narrowing the scope of each agent, we can help increase their general accuracy when responding to user messages. | |
108 | | |
109 | **Agents in LlamaIndex can also directly be used as tools** for other agents, for more complex and custom scenarios. | |
110 | | |
111 | ```python | |
112 | from llama_index.core.agent.workflow import ( | |
113 | AgentWorkflow, | |
114 | FunctionAgent, | |
115 | ReActAgent, | |
116 | ) | |
117 | | |
118 | # Define some tools | |
119 | def add(a: int, b: int) -> int: | |
120 | """Add two numbers.""" | |
121 | return a + b | |
122 | | |
123 | | |
124 | def subtract(a: int, b: int) -> int: | |
125 | """Subtract two numbers.""" | |
126 | return a - b | |
127 | | |
128 | | |
129 | # Create agent configs | |
130 | # NOTE: we can use FunctionAgent or ReActAgent here. | |
131 | # FunctionAgent works for LLMs with a function calling API. | |
132 | # ReActAgent works for any LLM. | |
133 | calculator_agent = ReActAgent( | |
134 | name="calculator", | |
135 | description="Performs basic arithmetic operations", | |
136 | system_prompt="You are a calculator assistant. Use your tools for any math operation.", | |
137 | tools=[add, subtract], | |
138 | llm=llm, | |
139 | ) | |
140 | | |
141 | query_agent = ReActAgent( | |
142 | name="info_lookup", | |
143 | description="Looks up information about XYZ", | |
144 | system_prompt="Use your tool to query a RAG system to answer information about XYZ", | |
145 | tools=[query_engine_tool], | |
146 | llm=llm | |
147 | ) | |
148 | | |
149 | # Create and run the workflow | |
150 | agent = AgentWorkflow( | |
151 | agents=[calculator_agent, query_agent], root_agent="calculator" | |
152 | ) | |
153 | | |
154 | # Run the system | |
155 | response = await agent.run(user_msg="Can you add 5 and 3?") | |
156 | ``` | |
157 | | |
158 | <Tip>Haven't learned enough yet? There is a lot more to discover about agents and tools in LlamaIndex within the <a href="https://docs.llamaindex.ai/en/stable/examples/agent/agent_workflow_basic/">AgentWorkflow Basic Introduction</a> or the <a href="https://docs.llamaindex.ai/en/stable/understanding/agent/">Agent Learning Guide</a>, where you can read more about streaming, context serialization, and human-in-the-loop!</Tip> | |
159 | | |
160 | Now that we understand the basics of agents and tools in LlamaIndex, let's see how we can use LlamaIndex to **create configurable and manageable workflows!** | |
161 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/conclusion.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Conclusion | |
2 | | |
3 | Congratulations on finishing the `llama-index` module of this second Unit 🥳 | |
4 | | |
5 | You’ve just mastered the fundamentals of `llama-index` and you’ve seen how to build your own agentic workflows! | |
6 | Now that you have skills in `llama-index`, you can start to create search engines that will solve tasks you're interested in. | |
7 | | |
8 | In the next module of the unit, you're going to learn **how to build Agents with LangGraph**. | |
9 | | |
10 | Finally, we would love **to hear what you think of the course and how we can improve it**. | |
11 | If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog) | |
12 | | |
13 | ### Keep Learning, and stay awesome 🤗 | |
14 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/introduction.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Introduction to LlamaIndex | |
2 | | |
3 | Welcome to this module, where you’ll learn how to build LLM-powered agents using the [LlamaIndex](https://www.llamaindex.ai/) toolkit. | |
4 | | |
5 | LlamaIndex is **a complete toolkit for creating LLM-powered agents over your data using indexes and workflows**. For this course we'll focus on three main parts that help build agents in LlamaIndex: **Components**, **Agents and Tools** and **Workflows**. | |
6 | | |
7 |  | |
8 | | |
9 | Let's look at these key parts of LlamaIndex and how they help with agents: | |
10 | | |
11 | - **Components**: Are the basic building blocks you use in LlamaIndex. These include things like prompts, models, and databases. Components often help connect LlamaIndex with other tools and libraries. | |
12 | - **Tools**: Tools are components that provide specific capabilities like searching, calculating, or accessing external services. They are the building blocks that enable agents to perform tasks. | |
13 | - **Agents**: Agents are autonomous components that can use tools and make decisions. They coordinate tool usage to accomplish complex goals. | |
14 | - **Workflows**: Are step-by-step processes that process logic together. Workflows or agentic workflows are a way to structure agentic behaviour without the explicit use of agents. | |
15 | | |
16 | | |
17 | ## What Makes LlamaIndex Special? | |
18 | | |
19 | While LlamaIndex does some things similar to other frameworks like smolagents, it has some key benefits: | |
20 | | |
21 | - **Clear Workflow System**: Workflows help break down how agents should make decisions step by step using an event-driven and async-first syntax. This helps you clearly compose and organize your logic. | |
22 | - **Advanced Document Parsing with LlamaParse**: LlamaParse was made specifically for LlamaIndex, so the integration is seamless, although it is a paid feature. | |
23 | - **Many Ready-to-Use Components**: LlamaIndex has been around for a while, so it works with lots of other frameworks. This means it has many tested and reliable components, like LLMs, retrievers, indexes, and more. | |
24 | - **LlamaHub**: is a registry of hundreds of these components, agents, and tools that you can use within LlamaIndex. | |
25 | | |
26 | All of these concepts are required in different scenarios to create useful agents. | |
27 | In the following sections, we will go over each of these concepts in detail. | |
28 | After mastering the concepts, we will use our learnings to **create applied use cases with Alfred the agent**! | |
29 | | |
30 | Getting our hands on LlamaIndex is exciting, right? So, what are we waiting for? Let's get started with **finding and installing the integrations we need using LlamaHub! 🚀** | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/llama-hub.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Introduction to the LlamaHub | |
2 | | |
3 | **LlamaHub is a registry of hundreds of integrations, agents and tools that you can use within LlamaIndex.** | |
4 | | |
5 |  | |
6 | | |
7 | We will be using various integrations in this course, so let's first look at the LlamaHub and how it can help us. | |
8 | | |
9 | Let's see how to find and install the dependencies for the components we need. | |
10 | | |
11 | ## Installation | |
12 | | |
13 | LlamaIndex installation instructions are available as a well-structured **overview on [LlamaHub](https://llamahub.ai/)**. | |
14 | This might be a bit overwhelming at first, but most of the **installation commands generally follow an easy-to-remember format**: | |
15 | | |
16 | ```bash | |
17 | pip install llama-index-{component-type}-{framework-name} | |
18 | ``` | |
19 | | |
20 | Let's try to install the dependencies for an LLM and embedding component using the [Hugging Face inference API integration](https://llamahub.ai/l/llms/llama-index-llms-huggingface-api?from=llms). | |
21 | | |
22 | ```bash | |
23 | pip install llama-index-llms-huggingface-api llama-index-embeddings-huggingface | |
24 | ``` | |
25 | | |
26 | ## Usage | |
27 | | |
28 | Once installed, we can see the usage patterns. You'll notice that the import paths follow the install command! | |
29 | Underneath, we can see an example of the usage of **the Hugging Face inference API for an LLM component**. | |
30 | | |
31 | ```python | |
32 | from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI | |
33 | import os | |
34 | from dotenv import load_dotenv | |
35 | | |
36 | # Load the .env file | |
37 | load_dotenv() | |
38 | | |
39 | # Retrieve HF_TOKEN from the environment variables | |
40 | hf_token = os.getenv("HF_TOKEN") | |
41 | | |
42 | llm = HuggingFaceInferenceAPI( | |
43 | model_name="Qwen/Qwen2.5-Coder-32B-Instruct", | |
44 | temperature=0.7, | |
45 | max_tokens=100, | |
46 | token=hf_token, | |
47 | ) | |
48 | | |
49 | response = llm.complete("Hello, how are you?") | |
50 | print(response) | |
51 | # I am good, how can I help you today? | |
52 | ``` | |
53 | | |
54 | Wonderful, we now know how to find, install and use the integrations for the components we need. | |
55 | **Let's dive deeper into the components** and see how we can use them to build our own agents. | |
56 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/quiz1.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Small Quiz (ungraded) [[quiz1]] | |
2 | | |
3 | So far we've discussed the key components and tools used in LlamaIndex. | |
4 | It's time to make a short quiz, since **testing yourself** is the best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf). | |
5 | This will help you find **where you need to reinforce your knowledge**. | |
6 | | |
7 | This is an optional quiz and it's not graded. | |
8 | | |
9 | ### Q1: What is a QueryEngine? | |
10 | Which of the following best describes a QueryEngine component? | |
11 | | |
12 | <Question | |
13 | choices={[ | |
14 | { | |
15 | text: "A system that only processes static text without any retrieval capabilities.", | |
16 | explain: "A QueryEngine must be able to retrieve and process relevant information.", | |
17 | }, | |
18 | { | |
19 | text: "A component that finds and retrieves relevant information as part of the RAG process.", | |
20 | explain: "This captures the core purpose of a QueryEngine component.", | |
21 | correct: true | |
22 | }, | |
23 | { | |
24 | text: "A tool that only stores vector embeddings without search functionality.", | |
25 | explain: "A QueryEngine does more than just store embeddings - it actively searches and retrieves information.", | |
26 | }, | |
27 | { | |
28 | text: "A component that only evaluates response quality.", | |
29 | explain: "Evaluation is separate from the QueryEngine's main retrieval purpose.", | |
30 | } | |
31 | ]} | |
32 | /> | |
33 | | |
34 | --- | |
35 | | |
36 | ### Q2: What is the Purpose of FunctionTools? | |
37 | Why are FunctionTools important for an Agent? | |
38 | | |
39 | <Question | |
40 | choices={[ | |
41 | { | |
42 | text: "To handle large amounts of data storage.", | |
43 | explain: "FunctionTools are not primarily for data storage.", | |
44 | }, | |
45 | { | |
46 | text: "To convert Python functions into tools that an agent can use.", | |
47 | explain: "FunctionTools wrap Python functions to make them accessible to agents.", | |
48 | correct: true | |
49 | }, | |
50 | { | |
51 | text: "To allow agents to create random functions definitions.", | |
52 | explain: "FunctionTools serve the specific purpose of making functions available to agents.", | |
53 | }, | |
54 | { | |
55 | text: "To only process text data.", | |
56 | explain: "FunctionTools can work with various types of functions, not just text processing.", | |
57 | } | |
58 | ]} | |
59 | /> | |
60 | | |
61 | --- | |
62 | | |
63 | ### Q3: What are Toolspecs in LlamaIndex? | |
64 | What is the main purpose of Toolspecs? | |
65 | | |
66 | <Question | |
67 | choices={[ | |
68 | { | |
69 | text: "They are redundant components that don't add functionality.", | |
70 | explain: "Toolspecs serve an important purpose in the LlamaIndex ecosystem.", | |
71 | }, | |
72 | { | |
73 | text: "They are sets of community-created tools that extend agent capabilities.", | |
74 | explain: "Toolspecs allow the community to share and reuse tools.", | |
75 | correct: true | |
76 | }, | |
77 | { | |
78 | text: "They are used solely for memory management.", | |
79 | explain: "Toolspecs are about providing tools, not managing memory.", | |
80 | }, | |
81 | { | |
82 | text: "They only work with text processing.", | |
83 | explain: "Toolspecs can include various types of tools, not just text processing.", | |
84 | } | |
85 | ]} | |
86 | /> | |
87 | | |
88 | --- | |
89 | | |
90 | ### Q4: What is Required to create a tool? | |
91 | What information must be included when creating a tool? | |
92 | | |
93 | <Question | |
94 | choices={[ | |
95 | { | |
96 | text: "A function, a name, and description must be defined.", | |
97 | explain: "While these all make up a tool, the name and description can be parsed from the function and docstring.", | |
98 | }, | |
99 | { | |
100 | text: "Only the name is required.", | |
101 | explain: "A function and description/docstring is also required for proper tool documentation.", | |
102 | }, | |
103 | { | |
104 | text: "Only the description is required.", | |
105 | explain: "A function is required so that we have code to run when an agent selects a tool", | |
106 | }, | |
107 | { | |
108 | text: "Only the function is required.", | |
109 | explain: "The name and description default to the name and docstring from the provided function", | |
110 | correct: true | |
111 | } | |
112 | ]} | |
113 | /> | |
114 | | |
115 | --- | |
116 | | |
117 | Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce your knowledge. If you pass it, you're ready to dive deeper into building with these components! | |
118 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/quiz2.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Quick Self-Check (ungraded) [[quiz2]] | |
2 | | |
3 | What?! Another Quiz? We know, we know, ... 😅 But this short, ungraded quiz is here to **help you reinforce key concepts you've just learned**. | |
4 | | |
5 | This quiz covers agent workflows and interactions - essential components for building effective AI agents. | |
6 | | |
7 | ### Q1: What is the purpose of AgentWorkflow in LlamaIndex? | |
8 | | |
9 | <Question | |
10 | choices={[ | |
11 | { | |
12 | text: "To run one or more agents with tools", | |
13 | explain: "Yes, the AgentWorkflow is the main way to quickly create a system with one or more agents.", | |
14 | correct: true | |
15 | }, | |
16 | { | |
17 | text: "To create a single agent that can query your data without memory", | |
18 | explain: "No, the AgentWorkflow is more capable than that, the QueryEngine is for simple queries over your data.", | |
19 | }, | |
20 | { | |
21 | text: "To automatically build tools for agents", | |
22 | explain: "The AgentWorkflow does not build tools, that is the job of the developer.", | |
23 | }, | |
24 | { | |
25 | text: "To manage agent memory and state", | |
26 | explain: "Managing memory and state is not the primary purpose of AgentWorkflow.", | |
27 | } | |
28 | ]} | |
29 | /> | |
30 | | |
31 | --- | |
32 | | |
33 | ### Q2: What object is used for keeping track of the state of the workflow? | |
34 | | |
35 | <Question | |
36 | choices={[ | |
37 | { | |
38 | text: "State", | |
39 | explain: "State is not the correct object for workflow state management.", | |
40 | }, | |
41 | { | |
42 | text: "Context", | |
43 | explain: "Context is the correct object used for keeping track of workflow state.", | |
44 | correct: true | |
45 | }, | |
46 | { | |
47 | text: "WorkflowState", | |
48 | explain: "WorkflowState is not the correct object.", | |
49 | }, | |
50 | { | |
51 | text: "Management", | |
52 | explain: "Management is not a valid object for workflow state.", | |
53 | } | |
54 | ]} | |
55 | /> | |
56 | | |
57 | --- | |
58 | | |
59 | ### Q3: Which method should be used if you want an agent to remember previous interactions? | |
60 | | |
61 | <Question | |
62 | choices={[ | |
63 | { | |
64 | text: "run(query_str)", | |
65 | explain: ".run(query_str) does not maintain conversation history.", | |
66 | }, | |
67 | { | |
68 | text: "chat(query_str, ctx=ctx)", | |
69 | explain: "chat() is not a valid method on workflows.", | |
70 | }, | |
71 | { | |
72 | text: "interact(query_str)", | |
73 | explain: "interact() is not a valid method for agent interactions.", | |
74 | }, | |
75 | { | |
76 | text: "run(query_str, ctx=ctx)", | |
77 | explain: "By passing in and maintaining the context, we can maintain state!", | |
78 | correct: true | |
79 | } | |
80 | ]} | |
81 | /> | |
82 | | |
83 | --- | |
84 | | |
85 | ### Q4: What is a key feature of Agentic RAG? | |
86 | | |
87 | <Question | |
88 | choices={[ | |
89 | { | |
90 | text: "It can only use document-based tools, to answer questions in a RAG workflow", | |
91 | explain: "Agentic RAG can use different tools, including document-based tools.", | |
92 | }, | |
93 | { | |
94 | text: "It automatically answers questions without tools, like a chatbot", | |
95 | explain: "Agentic RAG does use tools to answer questions.", | |
96 | }, | |
97 | { | |
98 | text: "It can decide to use any tool to answer questions, including RAG tools", | |
99 | explain: "Agentic RAG has the flexibility to use different tools to answer questions.", | |
100 | correct: true | |
101 | }, | |
102 | { | |
103 | text: "It only works with Function Calling Agents", | |
104 | explain: "Agentic RAG is not limited to Function Calling Agents.", | |
105 | } | |
106 | ]} | |
107 | /> | |
108 | | |
109 | --- | |
110 | | |
111 | | |
112 | Got it? Great! Now let's **do a brief recap of the unit!** | |
113 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/tools.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Using Tools in LlamaIndex | |
2 | | |
3 | **Defining a clear set of Tools is crucial to performance.** As we discussed in [unit 1](../../unit1/tools), clear tool interfaces are easier for LLMs to use. | |
4 | Much like a software API interface for human engineers, they can get more out of the tool if it's easy to understand how it works. | |
5 | | |
6 | There are **four main types of tools in LlamaIndex**: | |
7 | | |
8 |  | |
9 | | |
10 | 1. `FunctionTool`: Convert any Python function into a tool that an agent can use. It automatically figures out how the function works. | |
11 | 2. `QueryEngineTool`: A tool that lets agents use query engines. Since agents are built on query engines, they can also use other agents as tools. | |
12 | 3. `Toolspecs`: Sets of tools created by the community, which often include tools for specific services like Gmail. | |
13 | 4. `Utility Tools`: Special tools that help handle large amounts of data from other tools. | |
14 | | |
15 | We will go over each of them in more detail below. | |
16 | | |
17 | ## Creating a FunctionTool | |
18 | | |
19 | <Tip> | |
20 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/tools.ipynb" target="_blank">this notebook</a> that you can run using Google Colab. | |
21 | </Tip> | |
22 | | |
23 | A FunctionTool provides a simple way to wrap any Python function and make it available to an agent. | |
24 | You can pass either a synchronous or asynchronous function to the tool, along with optional `name` and `description` parameters. | |
25 | The name and description are particularly important as they help the agent understand when and how to use the tool effectively. | |
26 | Let's look at how to create a FunctionTool below and then call it. | |
27 | | |
28 | ```python | |
29 | from llama_index.core.tools import FunctionTool | |
30 | | |
31 | def get_weather(location: str) -> str: | |
32 | """Useful for getting the weather for a given location.""" | |
33 | print(f"Getting weather for {location}") | |
34 | return f"The weather in {location} is sunny" | |
35 | | |
36 | tool = FunctionTool.from_defaults( | |
37 | get_weather, | |
38 | name="my_weather_tool", | |
39 | description="Useful for getting the weather for a given location.", | |
40 | ) | |
41 | tool.call("New York") | |
42 | ``` | |
43 | | |
44 | <Tip>When using an agent or LLM with function calling, the tool selected (and the arguments written for that tool) rely strongly on the tool name and description of the purpose and arguments of the tool. Learn more about function calling in the <a href="https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/modules/function_calling.html">Function Calling Guide</a> and <a href="https://docs.llamaindex.ai/en/stable/understanding/agent/function_calling.html">Function Calling Learning Guide</a>.</Tip> | |
45 | | |
46 | ## Creating a QueryEngineTool | |
47 | | |
48 | The `QueryEngine` we defined in the previous unit can be easily transformed into a tool using the `QueryEngineTool` class. | |
49 | Let's see how to create a `QueryEngineTool` from a `QueryEngine` in the example below. | |
50 | | |
51 | ```python | |
52 | from llama_index.core import VectorStoreIndex | |
53 | from llama_index.core.tools import QueryEngineTool | |
54 | from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI | |
55 | from llama_index.embeddings.huggingface import HuggingFaceEmbedding | |
56 | from llama_index.vector_stores.chroma import ChromaVectorStore | |
57 | | |
58 | embed_model = HuggingFaceEmbedding("BAAI/bge-small-en-v1.5") | |
59 | | |
60 | db = chromadb.PersistentClient(path="./alfred_chroma_db") | |
61 | chroma_collection = db.get_or_create_collection("alfred") | |
62 | vector_store = ChromaVectorStore(chroma_collection=chroma_collection) | |
63 | | |
64 | index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model) | |
65 | | |
66 | llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct") | |
67 | query_engine = index.as_query_engine(llm=llm) | |
68 | tool = QueryEngineTool.from_defaults(query_engine, name="some useful name", description="some useful description") | |
69 | ``` | |
70 | | |
71 | ## Creating Toolspecs | |
72 | | |
73 | Think of `ToolSpecs` as collections of tools that work together harmoniously - like a well-organized professional toolkit. | |
74 | Just as a mechanic's toolkit contains complementary tools that work together for vehicle repairs, a `ToolSpec` combines related tools for specific purposes. | |
75 | For example, an accounting agent's `ToolSpec` might elegantly integrate spreadsheet capabilities, email functionality, and calculation tools to handle financial tasks with precision and efficiency. | |
76 | | |
77 | <details> | |
78 | <summary>Install the Google Toolspec</summary> | |
79 | As introduced in the <a href="./llama-hub">section on the LlamaHub</a>, we can install the Google toolspec with the following command: | |
80 | | |
81 | ```python | |
82 | pip install llama-index-tools-google | |
83 | ``` | |
84 | </details> | |
85 | | |
86 | And now we can load the toolspec and convert it to a list of tools. | |
87 | | |
88 | ```python | |
89 | from llama_index.tools.google import GmailToolSpec | |
90 | | |
91 | tool_spec = GmailToolSpec() | |
92 | tool_spec_list = tool_spec.to_tool_list() | |
93 | ``` | |
94 | | |
95 | To get a more detailed view of the tools, we can take a look at the `metadata` of each tool. | |
96 | | |
97 | ```python | |
98 | [(tool.metadata.name, tool.metadata.description) for tool in tool_spec_list] | |
99 | ``` | |
100 | | |
101 | ### Model Context Protocol (MCP) in LlamaIndex | |
102 | | |
103 | LlamaIndex also allows using MCP tools through a [ToolSpec on the LlamaHub](https://llamahub.ai/l/tools/llama-index-tools-mcp?from=). | |
104 | You can simply run an MCP server and start using it through the following implementation. | |
105 | | |
106 | ```python | |
107 | from llama_index.tools.mcp import BasicMCPClient, McpToolSpec | |
108 | | |
109 | # We consider there is a mcp server running on 127.0.0.1:8000, or you can use the mcp client to connect to your own mcp server. | |
110 | mcp_client = BasicMCPClient("http://127.0.0.1:8000/sse") | |
111 | mcp_tool = McpToolSpec(client=mcp_client) | |
112 | | |
113 | # get the agent | |
114 | agent = await get_agent(mcp_tool) | |
115 | | |
116 | # create the agent context | |
117 | agent_context = Context(agent) | |
118 | ``` | |
119 | | |
120 | ## Utility Tools | |
121 | | |
122 | Oftentimes, directly querying an API **can return an excessive amount of data**, some of which may be irrelevant, overflow the context window of the LLM, or unnecessarily increase the number of tokens that you are using. | |
123 | Let's walk through our two main utility tools below. | |
124 | | |
125 | 1. `OnDemandToolLoader`: This tool turns any existing LlamaIndex data loader (BaseReader class) into a tool that an agent can use. The tool can be called with all the parameters needed to trigger `load_data` from the data loader, along with a natural language query string. During execution, we first load data from the data loader, index it (for instance with a vector store), and then query it 'on-demand'. All three of these steps happen in a single tool call. | |
126 | 2. `LoadAndSearchToolSpec`: The LoadAndSearchToolSpec takes in any existing Tool as input. As a tool spec, it implements `to_tool_list`, and when that function is called, two tools are returned: a loading tool and then a search tool. The load Tool execution would call the underlying Tool, and the index the output (by default with a vector index). The search Tool execution would take in a query string as input and call the underlying index. | |
127 | | |
128 | <Tip>You can find toolspecs and utility tools on the <a href="https://llamahub.ai/">LlamaHub</a></Tip> | |
129 | | |
130 | Now that we understand the basics of agents and tools in LlamaIndex, let's see how we can **use LlamaIndex to create configurable and manageable workflows!** | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/llama-index/workflows.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Creating agentic workflows in LlamaIndex | |
2 | | |
3 | A workflow in LlamaIndex provides a structured way to organize your code into sequential and manageable steps. | |
4 | | |
5 | Such a workflow is created by defining `Steps` which are triggered by `Events`, and themselves emit `Events` to trigger further steps. | |
6 | Let's take a look at Alfred showing a LlamaIndex workflow for a RAG task. | |
7 | | |
8 |  | |
9 | | |
10 | **Workflows offer several key benefits:** | |
11 | | |
12 | - Clear organization of code into discrete steps | |
13 | - Event-driven architecture for flexible control flow | |
14 | - Type-safe communication between steps | |
15 | - Built-in state management | |
16 | - Support for both simple and complex agent interactions | |
17 | | |
18 | As you might have guessed, **workflows strike a great balance between the autonomy of agents while maintaining control over the overall workflow.** | |
19 | | |
20 | So, let's learn how to create a workflow ourselves! | |
21 | | |
22 | ## Creating Workflows | |
23 | | |
24 | <Tip> | |
25 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/workflows.ipynb" target="_blank">this notebook</a> that you can run using Google Colab. | |
26 | </Tip> | |
27 | | |
28 | ### Basic Workflow Creation | |
29 | | |
30 | <details> | |
31 | <summary>Install the Workflow package</summary> | |
32 | As introduced in the <a href="./llama-hub">section on the LlamaHub</a>, we can install the Workflow package with the following command: | |
33 | | |
34 | ```python | |
35 | pip install llama-index-utils-workflow | |
36 | ``` | |
37 | </details> | |
38 | | |
39 | We can create a single-step workflow by defining a class that inherits from `Workflow` and decorating your functions with `@step`. | |
40 | We will also need to add `StartEvent` and `StopEvent`, which are special events that are used to indicate the start and end of the workflow. | |
41 | | |
42 | ```python | |
43 | from llama_index.core.workflow import StartEvent, StopEvent, Workflow, step | |
44 | | |
45 | class MyWorkflow(Workflow): | |
46 | @step | |
47 | async def my_step(self, ev: StartEvent) -> StopEvent: | |
48 | # do something here | |
49 | return StopEvent(result="Hello, world!") | |
50 | | |
51 | | |
52 | w = MyWorkflow(timeout=10, verbose=False) | |
53 | result = await w.run() | |
54 | ``` | |
55 | | |
56 | As you can see, we can now run the workflow by calling `w.run()`. | |
57 | | |
58 | ### Connecting Multiple Steps | |
59 | | |
60 | To connect multiple steps, we **create custom events that carry data between steps.** | |
61 | To do so, we need to add an `Event` that is passed between the steps and transfers the output of the first step to the second step. | |
62 | | |
63 | ```python | |
64 | from llama_index.core.workflow import Event | |
65 | | |
66 | class ProcessingEvent(Event): | |
67 | intermediate_result: str | |
68 | | |
69 | class MultiStepWorkflow(Workflow): | |
70 | @step | |
71 | async def step_one(self, ev: StartEvent) -> ProcessingEvent: | |
72 | # Process initial data | |
73 | return ProcessingEvent(intermediate_result="Step 1 complete") | |
74 | | |
75 | @step | |
76 | async def step_two(self, ev: ProcessingEvent) -> StopEvent: | |
77 | # Use the intermediate result | |
78 | final_result = f"Finished processing: {ev.intermediate_result}" | |
79 | return StopEvent(result=final_result) | |
80 | | |
81 | w = MultiStepWorkflow(timeout=10, verbose=False) | |
82 | result = await w.run() | |
83 | result | |
84 | ``` | |
85 | | |
86 | The type hinting is important here, as it ensures that the workflow is executed correctly. Let's complicate things a bit more! | |
87 | | |
88 | ### Loops and Branches | |
89 | | |
90 | The type hinting is the most powerful part of workflows because it allows us to create branches, loops, and joins to facilitate more complex workflows. | |
91 | | |
92 | Let's show an example of **creating a loop** by using the union operator `|`. | |
93 | In the example below, we see that the `LoopEvent` is taken as input for the step and can also be returned as output. | |
94 | | |
95 | ```python | |
96 | from llama_index.core.workflow import Event | |
97 | import random | |
98 | | |
99 | | |
100 | class ProcessingEvent(Event): | |
101 | intermediate_result: str | |
102 | | |
103 | | |
104 | class LoopEvent(Event): | |
105 | loop_output: str | |
106 | | |
107 | | |
108 | class MultiStepWorkflow(Workflow): | |
109 | @step | |
110 | async def step_one(self, ev: StartEvent | LoopEvent) -> ProcessingEvent | LoopEvent: | |
111 | if random.randint(0, 1) == 0: | |
112 | print("Bad thing happened") | |
113 | return LoopEvent(loop_output="Back to step one.") | |
114 | else: | |
115 | print("Good thing happened") | |
116 | return ProcessingEvent(intermediate_result="First step complete.") | |
117 | | |
118 | @step | |
119 | async def step_two(self, ev: ProcessingEvent) -> StopEvent: | |
120 | # Use the intermediate result | |
121 | final_result = f"Finished processing: {ev.intermediate_result}" | |
122 | return StopEvent(result=final_result) | |
123 | | |
124 | | |
125 | w = MultiStepWorkflow(verbose=False) | |
126 | result = await w.run() | |
127 | result | |
128 | ``` | |
129 | | |
130 | ### Drawing Workflows | |
131 | | |
132 | We can also draw workflows. Let's use the `draw_all_possible_flows` function to draw the workflow. This stores the workflow in an HTML file. | |
133 | | |
134 | ```python | |
135 | from llama_index.utils.workflow import draw_all_possible_flows | |
136 | | |
137 | w = ... # as defined in the previous section | |
138 | draw_all_possible_flows(w, "flow.html") | |
139 | ``` | |
140 | | |
141 |  | |
142 | | |
143 | There is one last cool trick that we will cover in the course, which is the ability to add state to the workflow. | |
144 | | |
145 | ### State Management | |
146 | | |
147 | State management is useful when you want to keep track of the state of the workflow, so that every step has access to the same state. | |
148 | We can do this by using the `Context` type hint on top of a parameter in the step function. | |
149 | | |
150 | ```python | |
151 | from llama_index.core.workflow import Context, StartEvent, StopEvent | |
152 | | |
153 | | |
154 | @step | |
155 | async def query(self, ctx: Context, ev: StartEvent) -> StopEvent: | |
156 | # store query in the context | |
157 | await ctx.set("query", "What is the capital of France?") | |
158 | | |
159 | # do something with context and event | |
160 | val = ... | |
161 | | |
162 | # retrieve query from the context | |
163 | query = await ctx.get("query") | |
164 | | |
165 | return StopEvent(result=val) | |
166 | ``` | |
167 | | |
168 | Great! Now you know how to create basic workflows in LlamaIndex! | |
169 | | |
170 | <Tip>There are some more complex nuances to workflows, which you can learn about in <a href="https://docs.llamaindex.ai/en/stable/understanding/workflows/">the LlamaIndex documentation</a>.</Tip> | |
171 | | |
172 | However, there is another way to create workflows, which relies on the `AgentWorkflow` class. Let's take a look at how we can use this to create a multi-agent workflow. | |
173 | | |
174 | ## Automating workflows with Multi-Agent Workflows | |
175 | | |
176 | Instead of manual workflow creation, we can use the **`AgentWorkflow` class to create a multi-agent workflow**. | |
177 | The `AgentWorkflow` uses Workflow Agents to allow you to create a system of one or more agents that can collaborate and hand off tasks to each other based on their specialized capabilities. | |
178 | This enables building complex agent systems where different agents handle different aspects of a task. | |
179 | Instead of importing classes from `llama_index.core.agent`, we will import the agent classes from `llama_index.core.agent.workflow`. | |
180 | One agent must be designated as the root agent in the `AgentWorkflow` constructor. | |
181 | When a user message comes in, it is first routed to the root agent. | |
182 | | |
183 | Each agent can then: | |
184 | | |
185 | - Handle the request directly using their tools | |
186 | - Handoff to another agent better suited for the task | |
187 | - Return a response to the user | |
188 | | |
189 | Let's see how to create a multi-agent workflow. | |
190 | | |
191 | ```python | |
192 | from llama_index.core.agent.workflow import AgentWorkflow, ReActAgent | |
193 | from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI | |
194 | | |
195 | # Define some tools | |
196 | def add(a: int, b: int) -> int: | |
197 | """Add two numbers.""" | |
198 | return a + b | |
199 | | |
200 | def multiply(a: int, b: int) -> int: | |
201 | """Multiply two numbers.""" | |
202 | return a * b | |
203 | | |
204 | llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct") | |
205 | | |
206 | # we can pass functions directly without FunctionTool -- the fn/docstring are parsed for the name/description | |
207 | multiply_agent = ReActAgent( | |
208 | name="multiply_agent", | |
209 | description="Is able to multiply two integers", | |
210 | system_prompt="A helpful assistant that can use a tool to multiply numbers.", | |
211 | tools=[multiply], | |
212 | llm=llm, | |
213 | ) | |
214 | | |
215 | addition_agent = ReActAgent( | |
216 | name="add_agent", | |
217 | description="Is able to add two integers", | |
218 | system_prompt="A helpful assistant that can use a tool to add numbers.", | |
219 | tools=[add], | |
220 | llm=llm, | |
221 | ) | |
222 | | |
223 | # Create the workflow | |
224 | workflow = AgentWorkflow( | |
225 | agents=[multiply_agent, addition_agent], | |
226 | root_agent="multiply_agent", | |
227 | ) | |
228 | | |
229 | # Run the system | |
230 | response = await workflow.run(user_msg="Can you add 5 and 3?") | |
231 | ``` | |
232 | | |
233 | Agent tools can also modify the workflow state we mentioned earlier. Before starting the workflow, we can provide an initial state dict that will be available to all agents. | |
234 | The state is stored in the state key of the workflow context. It will be injected into the state_prompt which augments each new user message. | |
235 | | |
236 | Let's inject a counter to count function calls by modifying the previous example: | |
237 | | |
238 | ```python | |
239 | from llama_index.core.workflow import Context | |
240 | | |
241 | # Define some tools | |
242 | async def add(ctx: Context, a: int, b: int) -> int: | |
243 | """Add two numbers.""" | |
244 | # update our count | |
245 | cur_state = await ctx.get("state") | |
246 | cur_state["num_fn_calls"] += 1 | |
247 | await ctx.set("state", cur_state) | |
248 | | |
249 | return a + b | |
250 | | |
251 | async def multiply(ctx: Context, a: int, b: int) -> int: | |
252 | """Multiply two numbers.""" | |
253 | # update our count | |
254 | cur_state = await ctx.get("state") | |
255 | cur_state["num_fn_calls"] += 1 | |
256 | await ctx.set("state", cur_state) | |
257 | | |
258 | return a * b | |
259 | | |
260 | ... | |
261 | | |
262 | workflow = AgentWorkflow( | |
263 | agents=[multiply_agent, addition_agent], | |
264 | root_agent="multiply_agent" | |
265 | initial_state={"num_fn_calls": 0}, | |
266 | state_prompt="Current state: {state}. User message: {msg}", | |
267 | ) | |
268 | | |
269 | # run the workflow with context | |
270 | ctx = Context(workflow) | |
271 | response = await workflow.run(user_msg="Can you add 5 and 3?", ctx=ctx) | |
272 | | |
273 | # pull out and inspect the state | |
274 | state = await ctx.get("state") | |
275 | print(state["num_fn_calls"]) | |
276 | ``` | |
277 | | |
278 | Congratulations! You have now mastered the basics of Agents in LlamaIndex! 🎉 | |
279 | | |
280 | Let's continue with one final quiz to solidify your knowledge! 🚀 | |
281 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/smolagents/conclusion.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Conclusion | |
2 | | |
3 | Congratulations on finishing the `smolagents` module of this second Unit 🥳 | |
4 | | |
5 | You’ve just mastered the fundamentals of `smolagents` and you’ve built your own Agent! Now that you have skills in `smolagents`, you can now start to create Agents that will solve tasks you're interested about. | |
6 | | |
7 | In the next module, you're going to learn **how to build Agents with LlamaIndex**. | |
8 | | |
9 | Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog) | |
10 | | |
11 | ### Keep Learning, stay awesome 🤗 | |
12 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/smolagents/final_quiz.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Exam Time! | |
2 | | |
3 | Well done on working through the material on `smolagents`! You've already achieved a lot. Now, it's time to put your knowledge to the test with a quiz. 🧠 | |
4 | | |
5 | ## Instructions | |
6 | | |
7 | - The quiz consists of code questions. | |
8 | - You will be given instructions to complete the code snippets. | |
9 | - Read the instructions carefully and complete the code snippets accordingly. | |
10 | - For each question, you will be given the result and some feedback. | |
11 | | |
12 | 🧘 **This quiz is ungraded and uncertified**. It's about you understanding the `smolagents` library and knowing whether you should spend more time on the written material. In the coming units you'll put this knowledge to the test in use cases and projects. | |
13 | | |
14 | Let's get started! | |
15 | | |
16 | ## Quiz 🚀 | |
17 | | |
18 | <iframe | |
19 | src="https://agents-course-unit2-smolagents-quiz.hf.space" | |
20 | frameborder="0" | |
21 | width="850" | |
22 | height="450" | |
23 | ></iframe> | |
24 | | |
25 | You can also access the quiz 👉 [here](https://huggingface.co/spaces/agents-course/unit2_smolagents_quiz) | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/smolagents/introduction.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Introduction to `smolagents` | |
2 | | |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/thumbnail.jpg" alt="Unit 2.1 Thumbnail"/> | |
4 | | |
5 | Welcome to this module, where you'll learn **how to build effective agents** using the [`smolagents`](https://github.com/huggingface/smolagents) library, which provides a lightweight framework for creating capable AI agents. | |
6 | | |
7 | `smolagents` is a Hugging Face library; therefore, we would appreciate your support by **starring** the smolagents [`repository`](https://github.com/huggingface/smolagents) : | |
8 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/star_smolagents.gif" alt="staring smolagents"/> | |
9 | | |
10 | ## Module Overview | |
11 | | |
12 | This module provides a comprehensive overview of key concepts and practical strategies for building intelligent agents using `smolagents`. | |
13 | | |
14 | With so many open-source frameworks available, it's essential to understand the components and capabilities that make `smolagents` a useful option or to determine when another solution might be a better fit. | |
15 | | |
16 | We'll explore critical agent types, including code agents designed for software development tasks, tool calling agents for creating modular, function-driven workflows, and retrieval agents that access and synthesize information. | |
17 | | |
18 | Additionally, we'll cover the orchestration of multiple agents as well as the integration of vision capabilities and web browsing, which unlock new possibilities for dynamic and context-aware applications. | |
19 | | |
20 | In this unit, Alfred, the agent from Unit 1, makes his return. This time, he’s using the `smolagents` framework for his internal workings. Together, we’ll explore the key concepts behind this framework as Alfred tackles various tasks. Alfred is organizing a party at the Wayne Manor while the Wayne family 🦇 is away, and he has plenty to do. Join us as we showcase his journey and how he handles these tasks with `smolagents`! | |
21 | | |
22 | <Tip> | |
23 | | |
24 | In this unit, you will learn to build AI agents with the `smolagents` library. Your agents will be able to search for data, execute code, and interact with web pages. You will also learn how to combine multiple agents to create more powerful systems. | |
25 | | |
26 | </Tip> | |
27 | | |
28 |  | |
29 | | |
30 | ## Contents | |
31 | | |
32 | During this unit on `smolagents`, we cover: | |
33 | | |
34 | ### 1️⃣ [Why Use smolagents](./why_use_smolagents) | |
35 | | |
36 | `smolagents` is one of the many open-source agent frameworks available for application development. Alternative options include `LlamaIndex` and `LangGraph`, which are also covered in other modules in this course. `smolagents` offers several key features that might make it a great fit for specific use cases, but we should always consider all options when selecting a framework. We'll explore the advantages and drawbacks of using `smolagents`, helping you make an informed decision based on your project's requirements. | |
37 | | |
38 | ### 2️⃣ [CodeAgents](./code_agents) | |
39 | | |
40 | `CodeAgents` are the primary type of agent in `smolagents`. Instead of generating JSON or text, these agents produce Python code to perform actions. This module explores their purpose, functionality, and how they work, along with hands-on examples to showcase their capabilities. | |
41 | | |
42 | ### 3️⃣ [ToolCallingAgents](./tool_calling_agents) | |
43 | | |
44 | `ToolCallingAgents` are the second type of agent supported by `smolagents`. Unlike `CodeAgents`, which generate Python code, these agents rely on JSON/text blobs that the system must parse and interpret to execute actions. This module covers their functionality, their key differences from `CodeAgents`, and it provides an example to illustrate their usage. | |
45 | | |
46 | ### 4️⃣ [Tools](./tools) | |
47 | | |
48 | As we saw in Unit 1, tools are functions that an LLM can use within an agentic system, and they act as the essential building blocks for agent behavior. This module covers how to create tools, their structure, and different implementation methods using the `Tool` class or the `@tool` decorator. You'll also learn about the default toolbox, how to share tools with the community, and how to load community-contributed tools for use in your agents. | |
49 | | |
50 | ### 5️⃣ [Retrieval Agents](./retrieval_agents) | |
51 | | |
52 | Retrieval agents allow models access to knowledge bases, making it possible to search, synthesize, and retrieve information from multiple sources. They leverage vector stores for efficient retrieval and implement **Retrieval-Augmented Generation (RAG)** patterns. These agents are particularly useful for integrating web search with custom knowledge bases while maintaining conversation context through memory systems. This module explores implementation strategies, including fallback mechanisms for robust information retrieval. | |
53 | | |
54 | ### 6️⃣ [Multi-Agent Systems](./multi_agent_systems) | |
55 | | |
56 | Orchestrating multiple agents effectively is crucial for building powerful, multi-agent systems. By combining agents with different capabilities—such as a web search agent with a code execution agent—you can create more sophisticated solutions. This module focuses on designing, implementing, and managing multi-agent systems to maximize efficiency and reliability. | |
57 | | |
58 | ### 7️⃣ [Vision and Browser agents](./vision_agents) | |
59 | | |
60 | Vision agents extend traditional agent capabilities by incorporating **Vision-Language Models (VLMs)**, enabling them to process and interpret visual information. This module explores how to design and integrate VLM-powered agents, unlocking advanced functionalities like image-based reasoning, visual data analysis, and multimodal interactions. We will also use vision agents to build a browser agent that can browse the web and extract information from it. | |
61 | | |
62 | ## Resources | |
63 | | |
64 | - [smolagents Documentation](https://huggingface.co/docs/smolagents) - Official docs for the smolagents library | |
65 | - [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) - Research paper on agent architectures | |
66 | - [Agent Guidelines](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) - Best practices for building reliable agents | |
67 | - [LangGraph Agents](https://langchain-ai.github.io/langgraph/) - Additional examples of agent implementations | |
68 | - [Function Calling Guide](https://platform.openai.com/docs/guides/function-calling) - Understanding function calling in LLMs | |
69 | - [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/) - Guide to implementing effective RAG | |
70 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/smolagents/quiz1.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Small Quiz (ungraded) [[quiz1]] | |
2 | | |
3 | Let's test your understanding of `smolagents` with a quick quiz! Remember, testing yourself helps reinforce learning and identify areas that may need review. | |
4 | | |
5 | This is an optional quiz and it's not graded. | |
6 | | |
7 | ### Q1: What is one of the primary advantages of choosing `smolagents` over other frameworks? | |
8 | Which statement best captures a core strength of the `smolagents` approach? | |
9 | | |
10 | <Question | |
11 | choices={[ | |
12 | { | |
13 | text: "It uses highly specialized configuration files and a steep learning curve to ensure only expert developers can use it", | |
14 | explain: "smolagents is designed for simplicity and minimal code complexity, not steep learning curves.", | |
15 | }, | |
16 | { | |
17 | text: "It supports a code-first approach with minimal abstractions, letting agents interact directly via Python function calls", | |
18 | explain: "Yes, smolagents emphasizes a straightforward, code-centric design with minimal abstractions.", | |
19 | correct: true | |
20 | }, | |
21 | { | |
22 | text: "It focuses on JSON-based actions, removing the need for agents to write any code", | |
23 | explain: "While smolagents supports JSON-based tool calls (ToolCallingAgents), the library emphasizes code-based approaches with CodeAgents.", | |
24 | }, | |
25 | { | |
26 | text: "It deeply integrates with a single LLM provider and specialized hardware", | |
27 | explain: "smolagents supports multiple model providers and does not require specialized hardware.", | |
28 | } | |
29 | ]} | |
30 | /> | |
31 | | |
32 | --- | |
33 | | |
34 | ### Q2: In which scenario would you likely benefit most from using smolagents? | |
35 | Which situation aligns well with what smolagents does best? | |
36 | | |
37 | <Question | |
38 | choices={[ | |
39 | { | |
40 | text: "Prototyping or experimenting quickly with agent logic, particularly when your application is relatively straightforward", | |
41 | explain: "Yes. smolagents is designed for simple and nimble agent creation without extensive setup overhead.", | |
42 | correct: true | |
43 | }, | |
44 | { | |
45 | text: "Building a large-scale enterprise system where you need dozens of microservices and real-time data pipelines", | |
46 | explain: "While possible, smolagents is more focused on lightweight, code-centric experimentation rather than heavy enterprise infrastructure.", | |
47 | }, | |
48 | { | |
49 | text: "Needing a framework that only supports cloud-based LLMs and forbids local inference", | |
50 | explain: "smolagents offers flexible integration with local or hosted models, not exclusively cloud-based LLMs.", | |
51 | }, | |
52 | { | |
53 | text: "A scenario that requires advanced orchestration, multi-modal perception, and enterprise-scale features out-of-the-box", | |
54 | explain: "While you can integrate advanced capabilities, smolagents itself is lightweight and minimal at its core.", | |
55 | } | |
56 | ]} | |
57 | /> | |
58 | | |
59 | --- | |
60 | | |
61 | ### Q3: smolagents offers flexibility in model integration. Which statement best reflects its approach? | |
62 | Choose the most accurate description of how smolagents interoperates with LLMs. | |
63 | | |
64 | <Question | |
65 | choices={[ | |
66 | { | |
67 | text: "It only provides a single built-in model and does not allow custom integrations", | |
68 | explain: "smolagents supports multiple different backends and user-defined models.", | |
69 | }, | |
70 | { | |
71 | text: "It requires you to implement your own model connector for every LLM usage", | |
72 | explain: "There are multiple prebuilt connectors that make LLM integration straightforward.", | |
73 | }, | |
74 | { | |
75 | text: "It only integrates with open-source LLMs but not commercial APIs", | |
76 | explain: "smolagents can integrate with both open-source and commercial model APIs.", | |
77 | }, | |
78 | { | |
79 | text: "It can be used with a wide range of LLMs, offering predefined classes like TransformersModel, HfApiModel, and LiteLLMModel", | |
80 | explain: "This is correct. smolagents supports flexible model integration through various classes.", | |
81 | correct: true | |
82 | } | |
83 | ]} | |
84 | /> | |
85 | | |
86 | --- | |
87 | | |
88 | ### Q4: How does smolagents handle the debate between code-based actions and JSON-based actions? | |
89 | Which statement correctly characterizes smolagents' philosophy about action formats? | |
90 | | |
91 | <Question | |
92 | choices={[ | |
93 | { | |
94 | text: "It only allows JSON-based actions for all agent tasks, requiring a parser to extract the tool calls", | |
95 | explain: "ToolCallingAgent uses JSON-based calls, but smolagents also provides a primary CodeAgent option that writes Python code.", | |
96 | }, | |
97 | { | |
98 | text: "It focuses on code-based actions via a CodeAgent but also supports JSON-based tool calls with a ToolCallingAgent", | |
99 | explain: "Yes, smolagents primarily recommends code-based actions but includes a JSON-based alternative for users who prefer it or need it.", | |
100 | correct: true | |
101 | }, | |
102 | { | |
103 | text: "It disallows any external function calls, instead requiring all logic to reside entirely within the LLM", | |
104 | explain: "smolagents is specifically designed to grant LLMs the ability to call tools or code externally.", | |
105 | }, | |
106 | { | |
107 | text: "It requires users to manually convert every code snippet into a JSON object before running the agent", | |
108 | explain: "smolagents can automatically manage code snippet creation within the CodeAgent path, no manual JSON conversion necessary.", | |
109 | } | |
110 | ]} | |
111 | /> | |
112 | | |
113 | --- | |
114 | | |
115 | ### Q5: How does smolagents integrate with the Hugging Face Hub for added benefits? | |
116 | Which statement accurately describes one of the core advantages of Hub integration? | |
117 | | |
118 | <Question | |
119 | choices={[ | |
120 | { | |
121 | text: "It automatically upgrades all public models to commercial license tiers", | |
122 | explain: "Hub integration doesn't change the license tier for models or tools.", | |
123 | }, | |
124 | { | |
125 | text: "It disables local inference entirely, forcing remote model usage only", | |
126 | explain: "Users can still do local inference if they prefer; pushing to the Hub doesn't override local usage.", | |
127 | }, | |
128 | { | |
129 | text: "It allows you to push and share agents or tools, making them easily discoverable and reusable by other developers", | |
130 | explain: "smolagents supports uploading agents and tools to the HF Hub for others to reuse.", | |
131 | correct: true | |
132 | }, | |
133 | { | |
134 | text: "It permanently stores all your code-based agents, preventing any updates or versioning", | |
135 | explain: "Hub repositories support updates and version control, so you can revise your code-based agents any time.", | |
136 | } | |
137 | ]} | |
138 | /> | |
139 | | |
140 | --- | |
141 | | |
142 | Congratulations on completing this quiz! 🎉 If you missed any questions, consider reviewing the *Why use smolagents* section for a deeper understanding. If you did well, you're ready to explore more advanced topics in smolagents! | |
143 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/smolagents/quiz2.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Small Quiz (ungraded) [[quiz2]] | |
2 | | |
3 | It's time to test your understanding of the *Code Agents*, *Tool Calling Agents*, and *Tools* sections. This quiz is optional and not graded. | |
4 | | |
5 | --- | |
6 | | |
7 | ### Q1: What is the key difference between creating a tool with the `@tool` decorator versus creating a subclass of `Tool` in smolagents? | |
8 | | |
9 | Which statement best describes the distinction between these two approaches for defining tools? | |
10 | | |
11 | <Question | |
12 | choices={[ | |
13 | { | |
14 | text: "Using the <code>@tool</code> decorator is mandatory for retrieval-based tools, while subclasses of <code>Tool</code> are only for text-generation tasks", | |
15 | explain: "Both approaches can be used for any type of tool, including retrieval-based or text-generation tools.", | |
16 | }, | |
17 | { | |
18 | text: "The <code>@tool</code> decorator is recommended for simple function-based tools, while subclasses of <code>Tool</code> offer more flexibility for complex functionality or custom metadata", | |
19 | explain: "This is correct. The decorator approach is simpler, but subclassing allows more customized behavior.", | |
20 | correct: true | |
21 | }, | |
22 | { | |
23 | text: "<code>@tool</code> can only be used in multi-agent systems, while creating a <code>Tool</code> subclass is for single-agent scenarios", | |
24 | explain: "All agents (single or multi) can use either approach to define tools; there is no such restriction.", | |
25 | }, | |
26 | { | |
27 | text: "Decorating a function with <code>@tool</code> replaces the need for a docstring, whereas subclasses must not include docstrings", | |
28 | explain: "Both methods benefit from clear docstrings. The decorator doesn't replace them, and a subclass can still have docstrings.", | |
29 | } | |
30 | ]} | |
31 | /> | |
32 | | |
33 | --- | |
34 | | |
35 | ### Q2: How does a CodeAgent handle multi-step tasks using the ReAct (Reason + Act) approach? | |
36 | | |
37 | Which statement correctly describes how the CodeAgent executes a series of steps to solve a task? | |
38 | | |
39 | <Question | |
40 | choices={[ | |
41 | { | |
42 | text: "It passes each step to a different agent in a multi-agent system, then combines results", | |
43 | explain: "Although multi-agent systems can distribute tasks, CodeAgent itself can handle multiple steps on its own using ReAct.", | |
44 | }, | |
45 | { | |
46 | text: "It stores every action in JSON for easy parsing before executing them all at once", | |
47 | explain: "This behavior matches ToolCallingAgent's JSON-based approach, not CodeAgent.", | |
48 | }, | |
49 | { | |
50 | text: "It cycles through writing internal thoughts, generating Python code, executing the code, and logging the results until it arrives at a final answer", | |
51 | explain: "Correct. This describes the ReAct pattern that CodeAgent uses, including iterative reasoning and code execution.", | |
52 | correct: true | |
53 | }, | |
54 | { | |
55 | text: "It relies on a vision module to validate code output before continuing to the next step", | |
56 | explain: "Vision capabilities are supported in smolagents, but they're not a default requirement for CodeAgent or the ReAct approach.", | |
57 | } | |
58 | ]} | |
59 | /> | |
60 | | |
61 | --- | |
62 | | |
63 | ### Q3: Which of the following is a primary advantage of sharing a tool on the Hugging Face Hub? | |
64 | | |
65 | Select the best reason why a developer might upload and share their custom tool. | |
66 | | |
67 | <Question | |
68 | choices={[ | |
69 | { | |
70 | text: "It automatically integrates the tool with a MultiStepAgent for retrieval-augmented generation", | |
71 | explain: "Sharing a tool doesn't automatically set up retrieval or multi-step logic. It's just making the tool available.", | |
72 | }, | |
73 | { | |
74 | text: "It allows others to discover, reuse, and integrate your tool in their smolagents without extra setup", | |
75 | explain: "Yes. Sharing on the Hub makes tools accessible for anyone (including yourself) to download and reuse quickly.", | |
76 | correct: true | |
77 | }, | |
78 | { | |
79 | text: "It ensures that only CodeAgents can invoke the tool while ToolCallingAgents cannot", | |
80 | explain: "Both CodeAgents and ToolCallingAgents can invoke shared tools. There's no restriction by agent type.", | |
81 | }, | |
82 | { | |
83 | text: "It converts your tool into a fully vision-capable function for image processing", | |
84 | explain: "Tool sharing doesn't alter the tool's functionality or add vision capabilities automatically.", | |
85 | } | |
86 | ]} | |
87 | /> | |
88 | | |
89 | --- | |
90 | | |
91 | ### Q4: ToolCallingAgent differs from CodeAgent in how it executes actions. Which statement is correct? | |
92 | | |
93 | Choose the option that accurately describes how ToolCallingAgent works. | |
94 | | |
95 | <Question | |
96 | choices={[ | |
97 | { | |
98 | text: "ToolCallingAgent is only compatible with a multi-agent system, while CodeAgent can run alone", | |
99 | explain: "Either agent can be used alone or as part of a multi-agent system.", | |
100 | }, | |
101 | { | |
102 | text: "ToolCallingAgent delegates all reasoning to a separate retrieval agent, then returns a final answer", | |
103 | explain: "ToolCallingAgent still uses a main LLM for reasoning; it doesn't rely solely on retrieval agents.", | |
104 | }, | |
105 | { | |
106 | text: "ToolCallingAgent outputs JSON instructions specifying tool calls and arguments, which get parsed and executed", | |
107 | explain: "This is correct. ToolCallingAgent uses the JSON approach to define tool calls.", | |
108 | correct: true | |
109 | }, | |
110 | { | |
111 | text: "ToolCallingAgent is only meant for single-step tasks and automatically stops after calling one tool", | |
112 | explain: "ToolCallingAgent can perform multiple steps if needed, just like CodeAgent.", | |
113 | } | |
114 | ]} | |
115 | /> | |
116 | | |
117 | --- | |
118 | | |
119 | ### Q5: What is included in the smolagents default toolbox, and why might you use it? | |
120 | | |
121 | Which statement best captures the purpose and contents of the default toolbox in smolagents? | |
122 | | |
123 | <Question | |
124 | choices={[ | |
125 | { | |
126 | text: "It provides a set of commonly-used tools such as DuckDuckGo search, PythonInterpreterTool, and a final answer tool for quick prototyping", | |
127 | explain: "Correct. The default toolbox contains these ready-made tools for easy integration when building agents.", | |
128 | correct: true | |
129 | }, | |
130 | { | |
131 | text: "It only supports vision-based tasks like image classification or OCR by default", | |
132 | explain: "Although smolagents can integrate vision-based features, the default toolbox isn't exclusively vision-oriented.", | |
133 | }, | |
134 | { | |
135 | text: "It is intended solely for multi-agent systems and is incompatible with a single CodeAgent", | |
136 | explain: "The default toolbox can be used by any agent type, single or multi-agent setups alike.", | |
137 | }, | |
138 | { | |
139 | text: "It adds advanced retrieval-based functionality for large-scale question answering from a vector store", | |
140 | explain: "While you can build retrieval tools, the default toolbox does not automatically provide advanced RAG features.", | |
141 | } | |
142 | ]} | |
143 | /> | |
144 | | |
145 | --- | |
146 | | |
147 | Congratulations on completing this quiz! 🎉 If any questions gave you trouble, revisit the *Code Agents*, *Tool Calling Agents*, or *Tools* sections to strengthen your understanding. If you aced it, you're well on your way to building robust smolagents applications! | |
148 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/smolagents/retrieval_agents.mdx: | |
-------------------------------------------------------------------------------- | |
1 | <CourseFloatingBanner chapter={2} | |
2 | classNames="absolute z-10 right-0 top-0" | |
3 | notebooks={[ | |
4 | {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/retrieval_agents.ipynb"}, | |
5 | ]} /> | |
6 | | |
7 | # Building Agentic RAG Systems | |
8 | | |
9 | <Tip> | |
10 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/retrieval_agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab. | |
11 | </Tip> | |
12 | | |
13 | Retrieval Augmented Generation (RAG) systems combine the capabilities of data retrieval and generation models to provide context-aware responses. For example, a user's query is passed to a search engine, and the retrieved results are given to the model along with the query. The model then generates a response based on the query and retrieved information. | |
14 | | |
15 | Agentic RAG (Retrieval-Augmented Generation) extends traditional RAG systems by **combining autonomous agents with dynamic knowledge retrieval**. | |
16 | | |
17 | While traditional RAG systems use an LLM to answer queries based on retrieved data, agentic RAG **enables intelligent control of both retrieval and generation processes**, improving efficiency and accuracy. | |
18 | | |
19 | Traditional RAG systems face key limitations, such as **relying on a single retrieval step** and focusing on direct semantic similarity with the user’s query, which may overlook relevant information. | |
20 | | |
21 | Agentic RAG addresses these issues by allowing the agent to autonomously formulate search queries, critique retrieved results, and conduct multiple retrieval steps for a more tailored and comprehensive output. | |
22 | | |
23 | ## Basic Retrieval with DuckDuckGo | |
24 | | |
25 | Let's build a simple agent that can search the web using DuckDuckGo. This agent will retrieve information and synthesize responses to answer queries. With Agentic RAG, Alfred's agent can: | |
26 | | |
27 | * Search for latest superhero party trends | |
28 | * Refine results to include luxury elements | |
29 | * Synthesize information into a complete plan | |
30 | | |
31 | Here's how Alfred's agent can achieve this: | |
32 | | |
33 | ```python | |
34 | from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel | |
35 | | |
36 | # Initialize the search tool | |
37 | search_tool = DuckDuckGoSearchTool() | |
38 | | |
39 | # Initialize the model | |
40 | model = HfApiModel() | |
41 | | |
42 | agent = CodeAgent( | |
43 | model=model, | |
44 | tools=[search_tool], | |
45 | ) | |
46 | | |
47 | # Example usage | |
48 | response = agent.run( | |
49 | "Search for luxury superhero-themed party ideas, including decorations, entertainment, and catering." | |
50 | ) | |
51 | print(response) | |
52 | ``` | |
53 | | |
54 | The agent follows this process: | |
55 | | |
56 | 1. **Analyzes the Request:** Alfred’s agent identifies the key elements of the query—luxury superhero-themed party planning, with focus on decor, entertainment, and catering. | |
57 | 2. **Performs Retrieval:** The agent leverages DuckDuckGo to search for the most relevant and up-to-date information, ensuring it aligns with Alfred’s refined preferences for a luxurious event. | |
58 | 3. **Synthesizes Information:** After gathering the results, the agent processes them into a cohesive, actionable plan for Alfred, covering all aspects of the party. | |
59 | 4. **Stores for Future Reference:** The agent stores the retrieved information for easy access when planning future events, optimizing efficiency in subsequent tasks. | |
60 | | |
61 | ## Custom Knowledge Base Tool | |
62 | | |
63 | For specialized tasks, a custom knowledge base can be invaluable. Let's create a tool that queries a vector database of technical documentation or specialized knowledge. Using semantic search, the agent can find the most relevant information for Alfred's needs. | |
64 | | |
65 | A vector database stores numerical representations (embeddings) of text or other data, created by machine learning models. It enables semantic search by identifying similar meanings in high-dimensional space. | |
66 | | |
67 | This approach combines predefined knowledge with semantic search to provide context-aware solutions for event planning. With specialized knowledge access, Alfred can perfect every detail of the party. | |
68 | | |
69 | In this example, we'll create a tool that retrieves party planning ideas from a custom knowledge base. We'll use a BM25 retriever to search the knowledge base and return the top results, and `RecursiveCharacterTextSplitter` to split the documents into smaller chunks for more efficient search. | |
70 | | |
71 | ```python | |
72 | from langchain.docstore.document import Document | |
73 | from langchain.text_splitter import RecursiveCharacterTextSplitter | |
74 | from smolagents import Tool | |
75 | from langchain_community.retrievers import BM25Retriever | |
76 | from smolagents import CodeAgent, HfApiModel | |
77 | | |
78 | class PartyPlanningRetrieverTool(Tool): | |
79 | name = "party_planning_retriever" | |
80 | description = "Uses semantic search to retrieve relevant party planning ideas for Alfred’s superhero-themed party at Wayne Manor." | |
81 | inputs = { | |
82 | "query": { | |
83 | "type": "string", | |
84 | "description": "The query to perform. This should be a query related to party planning or superhero themes.", | |
85 | } | |
86 | } | |
87 | output_type = "string" | |
88 | | |
89 | def __init__(self, docs, **kwargs): | |
90 | super().__init__(**kwargs) | |
91 | self.retriever = BM25Retriever.from_documents( | |
92 | docs, k=5 # Retrieve the top 5 documents | |
93 | ) | |
94 | | |
95 | def forward(self, query: str) -> str: | |
96 | assert isinstance(query, str), "Your search query must be a string" | |
97 | | |
98 | docs = self.retriever.invoke( | |
99 | query, | |
100 | ) | |
101 | return "\nRetrieved ideas:\n" + "".join( | |
102 | [ | |
103 | f"\n\n===== Idea {str(i)} =====\n" + doc.page_content | |
104 | for i, doc in enumerate(docs) | |
105 | ] | |
106 | ) | |
107 | | |
108 | # Simulate a knowledge base about party planning | |
109 | party_ideas = [ | |
110 | {"text": "A superhero-themed masquerade ball with luxury decor, including gold accents and velvet curtains.", "source": "Party Ideas 1"}, | |
111 | {"text": "Hire a professional DJ who can play themed music for superheroes like Batman and Wonder Woman.", "source": "Entertainment Ideas"}, | |
112 | {"text": "For catering, serve dishes named after superheroes, like 'The Hulk's Green Smoothie' and 'Iron Man's Power Steak.'", "source": "Catering Ideas"}, | |
113 | {"text": "Decorate with iconic superhero logos and projections of Gotham and other superhero cities around the venue.", "source": "Decoration Ideas"}, | |
114 | {"text": "Interactive experiences with VR where guests can engage in superhero simulations or compete in themed games.", "source": "Entertainment Ideas"} | |
115 | ] | |
116 | | |
117 | source_docs = [ | |
118 | Document(page_content=doc["text"], metadata={"source": doc["source"]}) | |
119 | for doc in party_ideas | |
120 | ] | |
121 | | |
122 | # Split the documents into smaller chunks for more efficient search | |
123 | text_splitter = RecursiveCharacterTextSplitter( | |
124 | chunk_size=500, | |
125 | chunk_overlap=50, | |
126 | add_start_index=True, | |
127 | strip_whitespace=True, | |
128 | separators=["\n\n", "\n", ".", " ", ""], | |
129 | ) | |
130 | docs_processed = text_splitter.split_documents(source_docs) | |
131 | | |
132 | # Create the retriever tool | |
133 | party_planning_retriever = PartyPlanningRetrieverTool(docs_processed) | |
134 | | |
135 | # Initialize the agent | |
136 | agent = CodeAgent(tools=[party_planning_retriever], model=HfApiModel()) | |
137 | | |
138 | # Example usage | |
139 | response = agent.run( | |
140 | "Find ideas for a luxury superhero-themed party, including entertainment, catering, and decoration options." | |
141 | ) | |
142 | | |
143 | print(response) | |
144 | ``` | |
145 | | |
146 | This enhanced agent can: | |
147 | 1. First check the documentation for relevant information | |
148 | 2. Combine insights from the knowledge base | |
149 | 3. Maintain conversation context in memory | |
150 | | |
151 | ## Enhanced Retrieval Capabilities | |
152 | | |
153 | When building agentic RAG systems, the agent can employ sophisticated strategies like: | |
154 | | |
155 | 1. **Query Reformulation:** Instead of using the raw user query, the agent can craft optimized search terms that better match the target documents | |
156 | 2. **Multi-Step Retrieval:** The agent can perform multiple searches, using initial results to inform subsequent queries | |
157 | 3. **Source Integration:** Information can be combined from multiple sources like web search and local documentation | |
158 | 4. **Result Validation:** Retrieved content can be analyzed for relevance and accuracy before being included in responses | |
159 | | |
160 | Effective agentic RAG systems require careful consideration of several key aspects. The agent **should select between available tools based on the query type and context**. Memory systems help maintain conversation history and avoid repetitive retrievals. Having fallback strategies ensures the system can still provide value even when primary retrieval methods fail. Additionally, implementing validation steps helps ensure the accuracy and relevance of retrieved information. | |
161 | | |
162 | ## Resources | |
163 | | |
164 | - [Agentic RAG: turbocharge your RAG with query reformulation and self-query! 🚀](https://huggingface.co/learn/cookbook/agent_rag) - Recipe for developing an Agentic RAG system using smolagents. | |
165 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/smolagents/tool_calling_agents.mdx: | |
-------------------------------------------------------------------------------- | |
1 | <CourseFloatingBanner chapter={2} | |
2 | classNames="absolute z-10 right-0 top-0" | |
3 | notebooks={[ | |
4 | {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/tool_calling_agents.ipynb"}, | |
5 | ]} /> | |
6 | | |
7 | # Writing actions as code snippets or JSON blobs | |
8 | | |
9 | <Tip> | |
10 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/tool_calling_agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab. | |
11 | </Tip> | |
12 | | |
13 | Tool Calling Agents are the second type of agent available in `smolagents`. Unlike Code Agents that use Python snippets, these agents **use the built-in tool-calling capabilities of LLM providers** to generate tool calls as **JSON structures**. This is the standard approach used by OpenAI, Anthropic, and many other providers. | |
14 | | |
15 | Let's look at an example. When Alfred wants to search for catering services and party ideas, a `CodeAgent` would generate and run Python code like this: | |
16 | | |
17 | ```python | |
18 | for query in [ | |
19 | "Best catering services in Gotham City", | |
20 | "Party theme ideas for superheroes" | |
21 | ]: | |
22 | print(web_search(f"Search for: {query}")) | |
23 | ``` | |
24 | | |
25 | A `ToolCallingAgent` would instead create a JSON structure: | |
26 | | |
27 | ```python | |
28 | [ | |
29 | {"name": "web_search", "arguments": "Best catering services in Gotham City"}, | |
30 | {"name": "web_search", "arguments": "Party theme ideas for superheroes"} | |
31 | ] | |
32 | ``` | |
33 | | |
34 | This JSON blob is then used to execute the tool calls. | |
35 | | |
36 | While `smolagents` primarily focuses on `CodeAgents` since [they perform better overall](https://arxiv.org/abs/2402.01030), `ToolCallingAgents` can be effective for simple systems that don't require variable handling or complex tool calls. | |
37 | | |
38 |  | |
39 | | |
40 | ## How Do Tool Calling Agents Work? | |
41 | | |
42 | Tool Calling Agents follow the same multi-step workflow as Code Agents (see the [previous section](./code_agents) for details). | |
43 | | |
44 | The key difference is in **how they structure their actions**: instead of executable code, they **generate JSON objects that specify tool names and arguments**. The system then **parses these instructions** to execute the appropriate tools. | |
45 | | |
46 | ## Example: Running a Tool Calling Agent | |
47 | | |
48 | Let's revisit the previous example where Alfred started party preparations, but this time we'll use a `ToolCallingAgent` to highlight the difference. We'll build an agent that can search the web using DuckDuckGo, just like in our Code Agent example. The only difference is the agent type - the framework handles everything else: | |
49 | | |
50 | ```python | |
51 | from smolagents import ToolCallingAgent, DuckDuckGoSearchTool, HfApiModel | |
52 | | |
53 | agent = ToolCallingAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel()) | |
54 | | |
55 | agent.run("Search for the best music recommendations for a party at the Wayne's mansion.") | |
56 | ``` | |
57 | | |
58 | When you examine the agent's trace, instead of seeing `Executing parsed code:`, you'll see something like: | |
59 | | |
60 | ```text | |
61 | ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ | |
62 | │ Calling tool: 'web_search' with arguments: {'query': "best music recommendations for a party at Wayne's │ | |
63 | │ mansion"} │ | |
64 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ | |
65 | ``` | |
66 | | |
67 | The agent generates a structured tool call that the system processes to produce the output, rather than directly executing code like a `CodeAgent`. | |
68 | | |
69 | Now that we understand both agent types, we can choose the right one for our needs. Let's continue exploring `smolagents` to make Alfred's party a success! 🎉 | |
70 | | |
71 | ## Resources | |
72 | | |
73 | - [ToolCallingAgent documentation](https://huggingface.co/docs/smolagents/v1.8.1/en/reference/agents#smolagents.ToolCallingAgent) - Official documentation for ToolCallingAgent | |
74 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit2/smolagents/why_use_smolagents.mdx: | |
-------------------------------------------------------------------------------- | |
1 |  | |
2 | # Why use smolagents | |
3 | | |
4 | In this module, we will explore the pros and cons of using [smolagents](https://huggingface.co/docs/smolagents/en/index), helping you make an informed decision about whether it's the right framework for your needs. | |
5 | | |
6 | ## What is `smolagents`? | |
7 | | |
8 | `smolagents` is a simple yet powerful framework for building AI agents. It provides LLMs with the _agency_ to interact with the real world, such as searching or generating images. | |
9 | | |
10 | As we learned in unit 1, AI agents are programs that use LLMs to generate **'thoughts'** based on **'observations'** to perform **'actions'**. Let's explore how this is implemented in smolagents. | |
11 | | |
12 | ### Key Advantages of `smolagents` | |
13 | - **Simplicity:** Minimal code complexity and abstractions, to make the framework easy to understand, adopt and extend | |
14 | - **Flexible LLM Support:** Works with any LLM through integration with Hugging Face tools and external APIs | |
15 | - **Code-First Approach:** First-class support for Code Agents that write their actions directly in code, removing the need for parsing and simplifying tool calling | |
16 | - **HF Hub Integration:** Seamless integration with the Hugging Face Hub, allowing the use of Gradio Spaces as tools | |
17 | | |
18 | ### When to use smolagents? | |
19 | | |
20 | With these advantages in mind, when should we use smolagents over other frameworks? | |
21 | | |
22 | smolagents is ideal when: | |
23 | - You need a **lightweight and minimal solution.** | |
24 | - You want to **experiment quickly** without complex configurations. | |
25 | - Your **application logic is straightforward.** | |
26 | | |
27 | ### Code vs. JSON Actions | |
28 | Unlike other frameworks where agents write actions in JSON, `smolagents` **focuses on tool calls in code**, simplifying the execution process. This is because there's no need to parse the JSON in order to build code that calls the tools: the output can be executed directly. | |
29 | | |
30 | The following diagram illustrates this difference: | |
31 | | |
32 |  | |
33 | | |
34 | To review the difference between Code vs JSON Actions, you can revisit [the Actions Section in Unit 1](https://huggingface.co/learn/agents-course/unit1/actions#actions-enabling-the-agent-to-engage-with-its-environment). | |
35 | | |
36 | ### Agent Types in `smolagents` | |
37 | | |
38 | Agents in `smolagents` operate as **multi-step agents**. | |
39 | | |
40 | Each [`MultiStepAgent`](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.MultiStepAgent) performs: | |
41 | - One thought | |
42 | - One tool call and execution | |
43 | | |
44 | In addition to using **[CodeAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.CodeAgent)** as the primary type of agent, smolagents also supports **[ToolCallingAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.ToolCallingAgent)**, which writes tool calls in JSON. | |
45 | | |
46 | We will explore each agent type in more detail in the following sections. | |
47 | | |
48 | <Tip> | |
49 | In smolagents, tools are defined using <code>@tool</code> decorator wrapping a Python function or the <code>Tool</code> class. | |
50 | </Tip> | |
51 | | |
52 | ### Model Integration in `smolagents` | |
53 | `smolagents` supports flexible LLM integration, allowing you to use any callable model that meets [certain criteria](https://huggingface.co/docs/smolagents/main/en/reference/models). The framework provides several predefined classes to simplify model connections: | |
54 | | |
55 | - **[TransformersModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.TransformersModel):** Implements a local `transformers` pipeline for seamless integration. | |
56 | - **[HfApiModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.HfApiModel):** Supports [serverless inference](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) calls through [Hugging Face's infrastructure](https://huggingface.co/docs/api-inference/index), or via a growing number of [third-party inference providers](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#supported-providers-and-tasks). | |
57 | - **[LiteLLMModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.LiteLLMModel):** Leverages [LiteLLM](https://www.litellm.ai/) for lightweight model interactions. | |
58 | - **[OpenAIServerModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.OpenAIServerModel):** Connects to any service that offers an OpenAI API interface. | |
59 | - **[AzureOpenAIServerModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.AzureOpenAIServerModel):** Supports integration with any Azure OpenAI deployment. | |
60 | | |
61 | This flexibility ensures that developers can choose the model and service most suitable for their specific use cases, and allows for easy experimentation. | |
62 | | |
63 | Now that we understood why and when to use smolagents, let's dive deeper into this powerful library! | |
64 | | |
65 | ## Resources | |
66 | | |
67 | - [smolagents Blog](https://huggingface.co/blog/smolagents) - Introduction to smolagents and code interactions | |
68 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit3/README.md: | |
-------------------------------------------------------------------------------- | |
https://raw.githubusercontent.com/huggingface/agents-course/main/units/en/unit3/README.md | |
-------------------------------------------------------------------------------- | |
/units/en/unit3/agentic-rag/agentic-rag.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Agentic Retrieval Augmented Generation (RAG) | |
2 | | |
3 | In this unit, we'll be taking a look at how we can use Agentic RAG to help Alfred prepare for the amazing gala. | |
4 | | |
5 | <Tip>We know we've already discussed Retrieval Augmented Generation (RAG) and agentic RAG in the previous unit, so feel free to skip ahead if you're already familiar with the concepts.</Tip> | |
6 | | |
7 | LLMs are trained on enormous bodies of data to learn general knowledge. | |
8 | However, the world knowledge model of LLMs may not always be relevant and up-to-date information. | |
9 | **RAG solves this problem by finding and retrieving relevant information from your data and forwarding that to the LLM.** | |
10 | | |
11 |  | |
12 | | |
13 | Now, think about how Alfred works: | |
14 | | |
15 | 1. We've asked Alfred to help plan a gala | |
16 | 2. Alfred needs to find the latest news and weather information | |
17 | 3. Alfred needs to structure and search the guest information | |
18 | | |
19 | Just as Alfred needs to search through your household information to be helpful, any agent needs a way to find and understand relevant data. | |
20 | **Agentic RAG is a powerful way to use agents to answer questions about your data.** We can pass various tools to Alfred to help him answer questions. | |
21 | However, instead of answering the question on top of documents automatically, Alfred can decide to use any other tool or flow to answer the question. | |
22 | | |
23 |  | |
24 | | |
25 | Let's start **building our agentic RAG workflow!** | |
26 | | |
27 | First, we'll create a RAG tool to retrieve up-to-date details about the invitees. Next, we'll develop tools for web search, weather updates, and Hugging Face Hub model download statistics. Finally, we'll integrate everything to bring our agentic RAG agent to life! | |
28 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit3/agentic-rag/conclusion.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Conclusion | |
2 | | |
3 | In this unit, we've learned how to create an agentic RAG system to help Alfred, our friendly neighborhood agent, prepare for and manage an extravagant gala. | |
4 | | |
5 | The combination of RAG with agentic capabilities demonstrates how powerful AI assistants can become when they have: | |
6 | - Access to structured knowledge (guest information) | |
7 | - Ability to retrieve real-time information (web search) | |
8 | - Domain-specific tools (weather information, Hub stats) | |
9 | - Memory of past interactions | |
10 | | |
11 | With these capabilities, Alfred is now well-equipped to be the perfect host, able to answer questions about guests, provide up-to-date information, and ensure the gala runs smoothly—even managing the perfect timing for the fireworks display! | |
12 | | |
13 | <Tip> | |
14 | Now that you've built a complete agent, you might want to explore: | |
15 | | |
16 | - Creating more specialized tools for your own use cases | |
17 | - Implementing more sophisticated RAG systems with embeddings | |
18 | - Building multi-agent systems where agents can collaborate | |
19 | - Deploying your agent as a service that others can interact with | |
20 | | |
21 | </Tip> | |
22 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit3/agentic-rag/introduction.mdx: | |
-------------------------------------------------------------------------------- | |
1 | # Introduction to Use Case for Agentic RAG | |
2 | | |
3 |  | |
4 | | |
5 | In this unit, we will help Alfred, our friendly agent who is hosting the gala, by using Agentic RAG to create a tool that can be used to answer questions about the guests at the gala. | |
6 | | |
7 | <Tip> | |
8 | This is a 'real-world' use case for Agentic RAG, that you could use in your own projects or workplaces. If you want to get more out of this project, why not try it out on your own use case and share in Discord? | |
9 | </Tip> | |
10 | | |
11 | | |
12 | You can choose any of the frameworks discussed in the course for this use case. We provide code samples for each in separate tabs. | |
13 | | |
14 | ## A Gala to Remember | |
15 | | |
16 | Now, it's time to get our hands dirty with an actual use case. Let's set the stage! | |
17 | | |
18 | **You decided to host the most extravagant and opulent party of the century.** This means lavish feasts, enchanting dancers, renowned DJs, exquisite drinks, a breathtaking fireworks display, and much more. | |
19 | | |
20 | Alfred, your friendly neighbourhood agent, is getting ready to watch over all of your needs for this party, and **Alfred is going to manage everything himself**. To do so, he needs to have access to all of the information about the party, including the menu, the guests, the schedule, weather forecasts, and much more! | |
21 | | |
22 | Not only that, but he also needs to make sure that the party is going to be a success, so **he needs to be able to answer any questions about the party during the party**, whilst handling unexpected situations that may arise. | |
23 | | |
24 | He can't do this alone, so we need to make sure that Alfred has access to all of the information and tools he needs. | |
25 | | |
26 | First, let's give him a list of hard requirements for the gala. | |
27 | | |
28 | ## The Gala Requirements | |
29 | | |
30 | A properly educated person in the age of the **Renaissance** needs to have three main traits. | |
31 | He or she needed to be profound in the **knowledge of sports, culture, and science**. So, we need to make sure we can impress our guests with our knowledge and provide them with a truly unforgettable gala. | |
32 | However, to avoid any conflicts, there are some **topics, like politics and religion, that are to be avoided at a gala.** It needs to be a fun party without conflicts related to beliefs and ideals. | |
33 | | |
34 | According to etiquette, **a good host should be aware of guests' backgrounds**, including their interests and endeavours. A good host also gossips and shares stories about the guests with one another. | |
35 | | |
36 | Lastly, we need to make sure that we've got **some general knowledge about the weather** to ensure we can continuously find a real-time update to ensure perfect timing to launch the fireworks and end the gala with a bang! 🎆 | |
37 | | |
38 | As you can see, Alfred needs a lot of information to host the gala. | |
39 | Luckily, we can help and prepare Alfred by giving him some **Retrieval Augmented Generation (RAG) training!** | |
40 | | |
41 | Let's start by creating the tools that Alfred needs to be able to host the gala! | |
42 | | |
-------------------------------------------------------------------------------- | |
/units/en/unit4/README.md: | |
-------------------------------------------------------------------------------- | |
https://raw.githubusercontent.com/huggingface/agents-course/main/units/en/unit4/README.md | |
-------------------------------------------------------------------------------- |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment