Skip to content

Instantly share code, notes, and snippets.

@mcvaha
Created April 21, 2025 16:17
Show Gist options
  • Save mcvaha/4746eee943a40fb22fe48753f9a802e5 to your computer and use it in GitHub Desktop.
Save mcvaha/4746eee943a40fb22fe48753f9a802e5 to your computer and use it in GitHub Desktop.
Docs from Agents HF Course
└── units
└── en
├── _toctree.yml
├── bonus-unit1
├── conclusion.mdx
├── fine-tuning.mdx
├── introduction.mdx
└── what-is-function-calling.mdx
├── bonus-unit2
├── introduction.mdx
├── monitoring-and-evaluating-agents-notebook.mdx
├── quiz.mdx
└── what-is-agent-observability-and-evaluation.mdx
├── communication
├── live1.mdx
└── next-units.mdx
├── unit0
├── discord101.mdx
├── introduction.mdx
└── onboarding.mdx
├── unit1
├── README.md
├── actions.mdx
├── agent-steps-and-structure.mdx
├── conclusion.mdx
├── dummy-agent-library.mdx
├── final-quiz.mdx
├── introduction.mdx
├── messages-and-special-tokens.mdx
├── observations.mdx
├── quiz1.mdx
├── quiz2.mdx
├── thoughts.mdx
├── tools.mdx
├── tutorial.mdx
├── what-are-agents.mdx
└── what-are-llms.mdx
├── unit2
├── introduction.mdx
├── langgraph
│ ├── building_blocks.mdx
│ ├── conclusion.mdx
│ ├── document_analysis_agent.mdx
│ ├── first_graph.mdx
│ ├── introduction.mdx
│ ├── quiz1.mdx
│ └── when_to_use_langgraph.mdx
├── llama-index
│ ├── README.md
│ ├── agents.mdx
│ ├── components.mdx
│ ├── conclusion.mdx
│ ├── introduction.mdx
│ ├── llama-hub.mdx
│ ├── quiz1.mdx
│ ├── quiz2.mdx
│ ├── tools.mdx
│ └── workflows.mdx
└── smolagents
│ ├── code_agents.mdx
│ ├── conclusion.mdx
│ ├── final_quiz.mdx
│ ├── introduction.mdx
│ ├── multi_agent_systems.mdx
│ ├── quiz1.mdx
│ ├── quiz2.mdx
│ ├── retrieval_agents.mdx
│ ├── tool_calling_agents.mdx
│ ├── tools.mdx
│ ├── vision_agents.mdx
│ └── why_use_smolagents.mdx
├── unit3
├── README.md
└── agentic-rag
│ ├── agent.mdx
│ ├── agentic-rag.mdx
│ ├── conclusion.mdx
│ ├── introduction.mdx
│ ├── invitees.mdx
│ └── tools.mdx
└── unit4
└── README.md
/units/en/_toctree.yml:
--------------------------------------------------------------------------------
1 | - title: Unit 0. Welcome to the course
2 | sections:
3 | - local: unit0/introduction
4 | title: Welcome to the course 🤗
5 | - local: unit0/onboarding
6 | title: Onboarding
7 | - local: unit0/discord101
8 | title: (Optional) Discord 101
9 | - title: Live 1. How the course works and Q&A
10 | sections:
11 | - local: communication/live1
12 | title: Live 1. How the course works and Q&A
13 | - title: Unit 1. Introduction to Agents
14 | sections:
15 | - local: unit1/introduction
16 | title: Introduction
17 | - local: unit1/what-are-agents
18 | title: What is an Agent?
19 | - local: unit1/quiz1
20 | title: Quick Quiz 1
21 | - local: unit1/what-are-llms
22 | title: What are LLMs?
23 | - local: unit1/messages-and-special-tokens
24 | title: Messages and Special Tokens
25 | - local: unit1/tools
26 | title: What are Tools?
27 | - local: unit1/quiz2
28 | title: Quick Quiz 2
29 | - local: unit1/agent-steps-and-structure
30 | title: Understanding AI Agents through the Thought-Action-Observation Cycle
31 | - local: unit1/thoughts
32 | title: Thought, Internal Reasoning and the Re-Act Approach
33 | - local: unit1/actions
34 | title: Actions, Enabling the Agent to Engage with Its Environment
35 | - local: unit1/observations
36 | title: Observe, Integrating Feedback to Reflect and Adapt
37 | - local: unit1/dummy-agent-library
38 | title: Dummy Agent Library
39 | - local: unit1/tutorial
40 | title: Let’s Create Our First Agent Using smolagents
41 | - local: unit1/final-quiz
42 | title: Unit 1 Final Quiz
43 | - local: unit1/conclusion
44 | title: Conclusion
45 | - title: Unit 2. Frameworks for AI Agents
46 | sections:
47 | - local: unit2/introduction
48 | title: Frameworks for AI Agents
49 | - title: Unit 2.1 The smolagents framework
50 | sections:
51 | - local: unit2/smolagents/introduction
52 | title: Introduction to smolagents
53 | - local: unit2/smolagents/why_use_smolagents
54 | title: Why use smolagents?
55 | - local: unit2/smolagents/quiz1
56 | title: Quick Quiz 1
57 | - local: unit2/smolagents/code_agents
58 | title: Building Agents That Use Code
59 | - local: unit2/smolagents/tool_calling_agents
60 | title: Writing actions as code snippets or JSON blobs
61 | - local: unit2/smolagents/tools
62 | title: Tools
63 | - local: unit2/smolagents/retrieval_agents
64 | title: Retrieval Agents
65 | - local: unit2/smolagents/quiz2
66 | title: Quick Quiz 2
67 | - local: unit2/smolagents/multi_agent_systems
68 | title: Multi-Agent Systems
69 | - local: unit2/smolagents/vision_agents
70 | title: Vision and Browser agents
71 | - local: unit2/smolagents/final_quiz
72 | title: Final Quiz
73 | - local: unit2/smolagents/conclusion
74 | title: Conclusion
75 | - title: Unit 2.2 The LlamaIndex framework
76 | sections:
77 | - local: unit2/llama-index/introduction
78 | title: Introduction to LLamaIndex
79 | - local: unit2/llama-index/llama-hub
80 | title: Introduction to LlamaHub
81 | - local: unit2/llama-index/components
82 | title: What are Components in LlamaIndex?
83 | - local: unit2/llama-index/tools
84 | title: Using Tools in LlamaIndex
85 | - local: unit2/llama-index/quiz1
86 | title: Quick Quiz 1
87 | - local: unit2/llama-index/agents
88 | title: Using Agents in LlamaIndex
89 | - local: unit2/llama-index/workflows
90 | title: Creating Agentic Workflows in LlamaIndex
91 | - local: unit2/llama-index/quiz2
92 | title: Quick Quiz 2
93 | - local: unit2/llama-index/conclusion
94 | title: Conclusion
95 | - title: Unit 2.3 The LangGraph framework
96 | sections:
97 | - local: unit2/langgraph/introduction
98 | title: Introduction to LangGraph
99 | - local: unit2/langgraph/when_to_use_langgraph
100 | title: What is LangGraph?
101 | - local: unit2/langgraph/building_blocks
102 | title: Building Blocks of LangGraph
103 | - local: unit2/langgraph/first_graph
104 | title: Building Your First LangGraph
105 | - local: unit2/langgraph/document_analysis_agent
106 | title: Document Analysis Graph
107 | - local: unit2/langgraph/quiz1
108 | title: Quick Quiz 1
109 | - local: unit2/langgraph/conclusion
110 | title: Conclusion
111 | - title: Unit 3. Use Case for Agentic RAG
112 | sections:
113 | - local: unit3/agentic-rag/introduction
114 | title: Introduction to Use Case for Agentic RAG
115 | - local: unit3/agentic-rag/agentic-rag
116 | title: Agentic Retrieval Augmented Generation (RAG)
117 | - local: unit3/agentic-rag/invitees
118 | title: Creating a RAG Tool for Guest Stories
119 | - local: unit3/agentic-rag/tools
120 | title: Building and Integrating Tools for Your Agent
121 | - local: unit3/agentic-rag/agent
122 | title: Creating Your Gala Agent
123 | - local: unit3/agentic-rag/conclusion
124 | title: Conclusion
125 | - title: Bonus Unit 1. Fine-tuning an LLM for Function-calling
126 | sections:
127 | - local: bonus-unit1/introduction
128 | title: Introduction
129 | - local: bonus-unit1/what-is-function-calling
130 | title: What is Function Calling?
131 | - local: bonus-unit1/fine-tuning
132 | title: Let's Fine-Tune your model for Function-calling
133 | - local: bonus-unit1/conclusion
134 | title: Conclusion
135 | - title: Bonus Unit 2. Agent Observability and Evaluation
136 | sections:
137 | - local: bonus-unit2/introduction
138 | title: Introduction
139 | - local: bonus-unit2/what-is-agent-observability-and-evaluation
140 | title: What is agent observability and evaluation?
141 | - local: bonus-unit2/monitoring-and-evaluating-agents-notebook
142 | title: Monitoring and evaluating agents
143 | - local: bonus-unit2/quiz
144 | title: Quiz
145 | - title: When the next steps are published?
146 | sections:
147 | - local: communication/next-units
148 | title: Next Units
149 |
150 |
--------------------------------------------------------------------------------
/units/en/bonus-unit1/conclusion.mdx:
--------------------------------------------------------------------------------
1 | # Conclusion [[conclusion]]
2 |
3 | Congratulations on finishing this first Bonus Unit 🥳
4 |
5 | You've just **mastered understanding function-calling and how to fine-tune your model to do function-calling**!
6 |
7 | If we have one piece of advice now, it’s to try to **fine-tune different models**. The **best way to learn is by trying.**
8 |
9 | In the next Unit, you're going to learn how to use **state-of-the-art frameworks such as `smolagents`, `LlamaIndex` and `LangGraph`**.
10 |
11 | Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
12 |
13 | ### Keep Learning, Stay Awesome 🤗
--------------------------------------------------------------------------------
/units/en/bonus-unit1/fine-tuning.mdx:
--------------------------------------------------------------------------------
1 | # Let's Fine-Tune Your Model for Function-Calling
2 |
3 | We're now ready to fine-tune our first model for function-calling 🔥.
4 |
5 | ## How do we train our model for function-calling?
6 |
7 | > Answer: We need **data**
8 |
9 | A model training process can be divided into 3 steps:
10 |
11 | 1. **The model is pre-trained on a large quantity of data**. The output of that step is a **pre-trained model**. For instance, [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b). It's a base model and only knows how **to predict the next token without strong instruction following capabilities**.
12 |
13 | 2. To be useful in a chat context, the model then needs to be **fine-tuned** to follow instructions. In this step, it can be trained by model creators, the open-source community, you, or anyone. For instance, [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) is an instruction-tuned model by the Google Team behind the Gemma project.
14 |
15 | 3. The model can then be **aligned** to the creator's preferences. For instance, a customer service chat model that must never be impolite to customers.
16 |
17 | Usually a complete product like Gemini or Mistral **will go through all 3 steps**, whereas the models you can find on Hugging Face have completed one or more steps of this training.
18 |
19 | In this tutorial, we will build a function-calling model based on [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it). We choose the fine-tuned model [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) instead of the base model [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) because the fine-tuned model has been improved for our use-case.
20 |
21 | Starting from the pre-trained model **would require more training in order to learn instruction following, chat AND function-calling**.
22 |
23 | By starting from the instruction-tuned model, **we minimize the amount of information that our model needs to learn**.
24 |
25 | ## LoRA (Low-Rank Adaptation of Large Language Models)
26 |
27 | LoRA is a popular and lightweight training technique that significantly **reduces the number of trainable parameters**.
28 |
29 | It works by **inserting a smaller number of new weights as an adapter into the model to train**. This makes training with LoRA much faster, memory-efficient, and produces smaller model weights (a few hundred MBs), which are easier to store and share.
30 |
31 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/blog_multi-lora-serving_LoRA.gif" alt="LoRA inference" width="50%"/>
32 |
33 | LoRA works by adding pairs of rank decomposition matrices to Transformer layers, typically focusing on linear layers. During training, we will "freeze" the rest of the model and will only update the weights of those newly added adapters.
34 |
35 | By doing so, the number of **parameters** that we need to train drops considerably as we only need to update the adapter's weights.
36 |
37 | During inference, the input is passed into the adapter and the base model, or these adapter weights can be merged with the base model, resulting in no additional latency overhead.
38 |
39 | LoRA is particularly useful for adapting **large** language models to specific tasks or domains while keeping resource requirements manageable. This helps reduce the memory **required** to train a model.
40 |
41 | If you want to learn more about how LoRA works, you should check out this [tutorial](https://huggingface.co/learn/nlp-course/chapter11/4?fw=pt).
42 |
43 | ## Fine-Tuning a Model for Function-Calling
44 |
45 | You can access the tutorial notebook 👉 [here](https://huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb).
46 |
47 | Then, click on [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https://huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb) to be able to run it in a Colab Notebook.
48 |
49 |
50 |
--------------------------------------------------------------------------------
/units/en/bonus-unit1/introduction.mdx:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | ![Bonus Unit 1 Thumbnail](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/thumbnail.jpg)
4 |
5 | Welcome to this first **Bonus Unit**, where you'll learn to **fine-tune a Large Language Model (LLM) for function calling**.
6 |
7 | In terms of LLMs, function calling is quickly becoming a *must-know* technique.
8 |
9 | The idea is, rather than relying only on prompt-based approaches like we did in Unit 1, function calling trains your model to **take actions and interpret observations during the training phase**, making your AI more robust.
10 |
11 | > **When should I do this Bonus Unit?**
12 | >
13 | > This section is **optional** and is more advanced than Unit 1, so don't hesitate to either do this unit now or revisit it when your knowledge has improved thanks to this course.
14 | >
15 | > But don't worry, this Bonus Unit is designed to have all the information you need, so we'll walk you through every core concept of fine-tuning a model for function-calling even if you haven’t learned yet the inner workings of fine-tuning.
16 |
17 | The best way for you to be able to follow this Bonus Unit is:
18 |
19 | 1. Know how to Fine-Tune an LLM with Transformers, if it's not the case [check this](https://huggingface.co/learn/nlp-course/chapter3/1?fw=pt).
20 |
21 | 2. Know how to use `SFTTrainer` to fine-tune our model, to learn more about it [check this documentation](https://huggingface.co/learn/nlp-course/en/chapter11/1).
22 |
23 | ---
24 |
25 | ## What You’ll Learn
26 |
27 | 1. **Function Calling**
28 | How modern LLMs structure their conversations effectively letting them trigger **Tools**.
29 |
30 | 2. **LoRA (Low-Rank Adaptation)**
31 | A **lightweight and efficient** fine-tuning method that cuts down on computational and storage overhead. LoRA makes training large models *faster, cheaper, and easier* to deploy.
32 |
33 | 3. **The Thought → Act → Observe Cycle** in Function Calling models
34 | A simple but powerful approach for structuring how your model decides when (and how) to call functions, track intermediate steps, and interpret the results from external Tools or APIs.
35 |
36 | 4. **New Special Tokens**
37 | We’ll introduce **special markers** that help the model distinguish between:
38 | - Internal “chain-of-thought” reasoning
39 | - Outgoing function calls
40 | - Responses coming back from external tools
41 |
42 | ---
43 |
44 | By the end of this bonus unit, you’ll be able to:
45 |
46 | - **Understand** the inner working of APIs when it comes to Tools.
47 | - **Fine-tune** a model using the LoRA technique.
48 | - **Implement** and **modify** the Thought → Act → Observe cycle to create robust and maintainable Function-calling workflows.
49 | - **Design and utilize** special tokens to seamlessly separate the model’s internal reasoning from its external actions.
50 |
51 | And you'll **have fine-tuned your own model to do function calling.** 🔥
52 |
53 | Let’s dive into **function calling**!
54 |
--------------------------------------------------------------------------------
/units/en/bonus-unit1/what-is-function-calling.mdx:
--------------------------------------------------------------------------------
1 | # What is Function Calling?
2 |
3 | Function-calling is a **way for an LLM to take actions on its environment**. It was first [introduced in GPT-4](https://openai.com/index/function-calling-and-other-api-updates/), and was later reproduced in other models.
4 |
5 | Just like the tools of an Agent, function-calling gives the model the capacity to **take an action on its environment**. However, the function calling capacity **is learned by the model**, and relies **less on prompting than other agents techniques**.
6 |
7 | During Unit 1, the Agent **didn't learn to use the Tools**, we just provided the list, and we relied on the fact that the model **was able to generalize on defining a plan using these Tools**.
8 |
9 | While here, **with function-calling, the Agent is fine-tuned (trained) to use Tools**.
10 |
11 | ## How does the model "learn" to take an action?
12 |
13 | In Unit 1, we explored the general workflow of an agent. Once the user has given some tools to the agent and prompted it with a query, the model will cycle through:
14 |
15 | 1. *Think* : What action(s) do I need to take in order to fulfill the objective.
16 | 2. *Act* : Format the action with the correct parameter and stop the generation.
17 | 3. *Observe* : Get back the result from the execution.
18 |
19 | In a "typical" conversation with a model through an API, the conversation will alternate between user and assistant messages like this:
20 |
21 | ```python
22 | conversation = [
23 | {"role": "user", "content": "I need help with my order"},
24 | {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
25 | {"role": "user", "content": "It's ORDER-123"},
26 | ]
27 | ```
28 |
29 | Function-calling brings **new roles to the conversation**!
30 |
31 | 1. One new role for an **Action**
32 | 2. One new role for an **Observation**
33 |
34 | If we take the [Mistral API](https://docs.mistral.ai/capabilities/function_calling/) as an example, it would look like this:
35 |
36 | ```python
37 | conversation = [
38 | {
39 | "role": "user",
40 | "content": "What's the status of my transaction T1001?"
41 | },
42 | {
43 | "role": "assistant",
44 | "content": "",
45 | "function_call": {
46 | "name": "retrieve_payment_status",
47 | "arguments": "{\"transaction_id\": \"T1001\"}"
48 | }
49 | },
50 | {
51 | "role": "tool",
52 | "name": "retrieve_payment_status",
53 | "content": "{\"status\": \"Paid\"}"
54 | },
55 | {
56 | "role": "assistant",
57 | "content": "Your transaction T1001 has been successfully paid."
58 | }
59 | ]
60 | ```
61 |
62 | > ... But you said there's a new role for function calls?
63 |
64 | **Yes and no**, in this case and in a lot of other APIs, the model formats the action to take as an "assistant" message. The chat template will then represent this as **special tokens** for function-calling.
65 |
66 | - `[AVAILABLE_TOOLS]` – Start the list of available tools
67 | - `[/AVAILABLE_TOOLS]` – End the list of available tools
68 | - `[TOOL_CALLS]` – Make a call to a tool (i.e., take an "Action")
69 | - `[TOOL_RESULTS]` – "Observe" the result of the action
70 | - `[/TOOL_RESULTS]` – End of the observation (i.e., the model can decode again)
71 |
72 | We'll talk again about function-calling in this course, but if you want to dive deeper you can check [this excellent documentation section](https://docs.mistral.ai/capabilities/function_calling/).
73 |
74 | ---
75 | Now that we learned what function-calling is and how it works, let's **add some function-calling capabilities to a model that does not have those capacities yet**: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it), by appending some new special tokens to the model.
76 |
77 | To be able to do that, **we need first to understand fine-tuning and LoRA**.
78 |
--------------------------------------------------------------------------------
/units/en/bonus-unit2/introduction.mdx:
--------------------------------------------------------------------------------
1 | # AI Agent Observability & Evaluation
2 |
3 | ![Bonus Unit 2 Thumbnail](https://langfuse.com/images/cookbook/huggingface-agent-course/agent-observability-and-evaluation.png)
4 |
5 | Welcome to **Bonus Unit 2**! In this chapter, you'll explore advanced strategies for observing, evaluating, and ultimately improving the performance of your agents.
6 |
7 | ---
8 |
9 | ## 📚 When Should I Do This Bonus Unit?
10 |
11 | This bonus unit is perfect if you:
12 | - **Develop and Deploy AI Agents:** You want to ensure that your agents are performing reliably in production.
13 | - **Need Detailed Insights:** You're looking to diagnose issues, optimize performance, or understand the inner workings of your agent.
14 | - **Aim to Reduce Operational Overhead:** By monitoring agent costs, latency, and execution details, you can efficiently manage resources.
15 | - **Seek Continuous Improvement:** You’re interested in integrating both real-time user feedback and automated evaluation into your AI applications.
16 |
17 | In short, for everyone who wants to bring their agents in front of users!
18 |
19 | ---
20 |
21 | ## 🤓 What You’ll Learn
22 |
23 | In this unit, you'll learn:
24 | - **Instrument Your Agent:** Learn how to integrate observability tools via OpenTelemetry with the *smolagents* framework.
25 | - **Monitor Metrics:** Track performance indicators such as token usage (costs), latency, and error traces.
26 | - **Evaluate in Real-Time:** Understand techniques for live evaluation, including gathering user feedback and leveraging an LLM-as-a-judge.
27 | - **Offline Analysis:** Use benchmark datasets (e.g., GSM8K) to test and compare agent performance.
28 |
29 | ---
30 |
31 | ## 🚀 Ready to Get Started?
32 |
33 | In the next section, you'll learn the basics of Agent Observability and Evaluation. After that, its time to see it in action!
--------------------------------------------------------------------------------
/units/en/bonus-unit2/quiz.mdx:
--------------------------------------------------------------------------------
1 | # Quiz: Evaluating AI Agents
2 |
3 | Let's assess your understanding of the agent tracing and evaluation concepts covered in this bonus unit.
4 |
5 | This quiz is optional and ungraded.
6 |
7 | ### Q1: What does observability in AI agents primarily refer to?
8 | Which statement accurately describes the purpose of observability for AI agents?
9 |
10 | <Question
11 | choices={[
12 | {
13 | text: "It involves tracking internal operations through logs, metrics, and spans to understand agent behavior.",
14 | explain: "Correct! Observability means using logs, metrics, and spans to shed light on the inner workings of the agent.",
15 | correct: true
16 | },
17 | {
18 | text: "It is solely focused on reducing the financial cost of running the agent.",
19 | explain: "Observability covers cost but is not limited to it."
20 | },
21 | {
22 | text: "It refers only to the external appearance and UI of the agent.",
23 | explain: "Observability is about the internal processes, not the UI."
24 | },
25 | {
26 | text: "It is concerned with coding style and code aesthetics only.",
27 | explain: "Code style is unrelated to observability in this context."
28 | }
29 | ]}
30 | />
31 |
32 | ### Q2: Which of the following is NOT a common metric monitored in agent observability?
33 | Select the metric that does not typically fall under the observability umbrella.
34 |
35 | <Question
36 | choices={[
37 | {
38 | text: "Latency",
39 | explain: "Latency is commonly tracked to assess agent responsiveness."
40 | },
41 | {
42 | text: "Cost per Agent Run",
43 | explain: "Monitoring cost is a key aspect of observability."
44 | },
45 | {
46 | text: "User Feedback and Ratings",
47 | explain: "User feedback is crucial for evaluating agent performance."
48 | },
49 | {
50 | text: "Lines of Code of the Agent",
51 | explain: "The number of lines of code is not a typical observability metric.",
52 | correct: true
53 | }
54 | ]}
55 | />
56 |
57 | ### Q3: What best describes offline evaluation of an AI agent?
58 | Determine the statement that correctly captures the essence of offline evaluation.
59 |
60 | <Question
61 | choices={[
62 | {
63 | text: "Evaluating the agent using real user interactions in a live environment.",
64 | explain: "This describes online evaluation rather than offline."
65 | },
66 | {
67 | text: "Assessing agent performance using curated datasets with known ground truth.",
68 | explain: "Correct! Offline evaluation uses test datasets to gauge performance against known answers.",
69 | correct: true
70 | },
71 | {
72 | text: "Monitoring the agent's internal logs in real-time.",
73 | explain: "This is more related to observability rather than evaluation."
74 | },
75 | {
76 | text: "Running the agent without any evaluation metrics.",
77 | explain: "This approach does not provide meaningful insights."
78 | }
79 | ]}
80 | />
81 |
82 | ### Q4: Which advantage does online evaluation of agents offer?
83 | Pick the statement that best reflects the benefit of online evaluation.
84 |
85 | <Question
86 | choices={[
87 | {
88 | text: "It provides controlled testing scenarios using pre-defined datasets.",
89 | explain: "Controlled testing is a benefit of offline evaluation, not online."
90 | },
91 | {
92 | text: "It captures live user interactions and real-world performance data.",
93 | explain: "Correct! Online evaluation offers insights by monitoring the agent in a live setting.",
94 | correct: true
95 | },
96 | {
97 | text: "It eliminates the need for any offline testing and benchmarks.",
98 | explain: "Both offline and online evaluations are important and complementary."
99 | },
100 | {
101 | text: "It solely focuses on reducing the computational cost of the agent.",
102 | explain: "Cost monitoring is part of observability, not the primary advantage of online evaluation."
103 | }
104 | ]}
105 | />
106 |
107 | ### Q5: What role does OpenTelemetry play in AI agent observability and evaluation?
108 | Which statement best describes the role of OpenTelemetry in monitoring AI agents?
109 |
110 | <Question
111 | choices={[
112 | {
113 | text: "It provides a standardized framework to instrument code, enabling the collection of traces, metrics, and logs for observability.",
114 | explain: "Correct! OpenTelemetry standardizes instrumentation for telemetry data, which is crucial for monitoring and diagnosing agent behavior.",
115 | correct: true
116 | },
117 | {
118 | text: "It acts as a replacement for manual debugging by automatically fixing code issues.",
119 | explain: "Incorrect. OpenTelemetry is used for gathering telemetry data, not for debugging code issues."
120 | },
121 | {
122 | text: "It primarily serves as a database for storing historical logs without real-time capabilities.",
123 | explain: "Incorrect. OpenTelemetry focuses on real-time telemetry data collection and exporting data to analysis tools."
124 | },
125 | {
126 | text: "It is used to optimize the computational performance of the AI agent by automatically tuning model parameters.",
127 | explain: "Incorrect. OpenTelemetry is centered on observability rather than performance tuning."
128 | }
129 | ]}
130 | />
131 |
132 | Congratulations on completing this quiz! 🎉 If you missed any questions, consider reviewing the content of this bonus unit for a deeper understanding. If you did well, you're ready to explore more advanced topics in agent observability and evaluation!
133 |
--------------------------------------------------------------------------------
/units/en/bonus-unit2/what-is-agent-observability-and-evaluation.mdx:
--------------------------------------------------------------------------------
1 | # AI Agent Observability and Evaluation
2 |
3 | ## 🔎 What is Observability?
4 |
5 | Observability is about understanding what's happening inside your AI agent by looking at external signals like logs, metrics, and traces. For AI agents, this means tracking actions, tool usage, model calls, and responses to debug and improve agent performance.
6 |
7 | ![Observability dashboard](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit2/langfuse-dashboard.png)
8 |
9 | ## 🔭 Why Agent Observability Matters
10 |
11 | Without observability, AI agents are "black boxes." Observability tools make agents transparent, enabling you to:
12 |
13 | - Understand costs and accuracy trade-offs
14 | - Measure latency
15 | - Detect harmful language & prompt injection
16 | - Monitor user feedback
17 |
18 | In other words, it makes your demo agent ready for production!
19 |
20 | ## 🔨 Observability Tools
21 |
22 | Common observability tools for AI agents include platforms like [Langfuse](https://langfuse.com) and [Arize](https://www.arize.com). These tools help collect detailed traces and offer dashboards to monitor metrics in real-time, making it easy to detect problems and optimize performance.
23 |
24 | Observability tools vary widely in their features and capabilities. Some tools are open source, benefiting from large communities that shape their roadmaps and extensive integrations. Additionally, certain tools specialize in specific aspects of LLMOps—such as observability, evaluations, or prompt management—while others are designed to cover the entire LLMOps workflow. We encourage you to explore the documentation of different options to pick a solution that works well for you.
25 |
26 | Many agent frameworks such as [smolagents](https://huggingface.co/docs/smolagents/v1.12.0/en/index) use the [OpenTelemetry](https://opentelemetry.io/docs/) standard to expose metadata to the observability tools. In addition to this, observability tools build custom instrumentations to allow for more flexibility in the fast moving world of LLMs. You should check the documentation of the tool you are using to see what is supported.
27 |
28 | ## 🔬Traces and Spans
29 |
30 | Observability tools usually represent agent runs as traces and spans.
31 |
32 | - **Traces** represent a complete agent task from start to finish (like handling a user query).
33 | - **Spans** are individual steps within the trace (like calling a language model or retrieving data).
34 |
35 | ![Example of a smolagent trace in Langfuse](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit2/trace-tree.png)
36 |
37 | ## 📊 Key Metrics to Monitor
38 |
39 | Here are some of the most common metrics that observability tools monitor:
40 |
41 | **Latency:** How quickly does the agent respond? Long waiting times negatively impact user experience. You should measure latency for tasks and individual steps by tracing agent runs. For example, an agent that takes 20 seconds for all model calls could be accelerated by using a faster model or by running model calls in parallel.
42 |
43 | **Costs:** What’s the expense per agent run? AI agents rely on LLM calls billed per token or external APIs. Frequent tool usage or multiple prompts can rapidly increase costs. For instance, if an agent calls an LLM five times for marginal quality improvement, you must assess if the cost is justified or if you could reduce the number of calls or use a cheaper model. Real-time monitoring can also help identify unexpected spikes (e.g., bugs causing excessive API loops).
44 |
45 | **Request Errors:** How many requests did the agent fail? This can include API errors or failed tool calls. To make your agent more robust against these in production, you can then set up fallbacks or retries. E.g. if LLM provider A is down, you switch to LLM provider B as backup.
46 |
47 | **User Feedback:** Implementing direct user evaluations provide valuable insights. This can include explicit ratings (👍thumbs-up/👎down, ⭐1-5 stars) or textual comments. Consistent negative feedback should alert you as this is a sign that the agent is not working as expected.
48 |
49 | **Implicit User Feedback:** User behaviors provide indirect feedback even without explicit ratings. This can include immediate question rephrasing, repeated queries or clicking a retry button. E.g. if you see that users repeatedly ask the same question, this is a sign that the agent is not working as expected.
50 |
51 | **Accuracy:** How frequently does the agent produce correct or desirable outputs? Accuracy definitions vary (e.g., problem-solving correctness, information retrieval accuracy, user satisfaction). The first step is to define what success looks like for your agent. You can track accuracy via automated checks, evaluation scores, or task completion labels. For example, marking traces as "succeeded" or "failed".
52 |
53 | **Automated Evaluation Metrics:** You can also set up automated evals. For instance, you can use an LLM to score the output of the agent e.g. if it is helpful, accurate, or not. There are also several open source libraries that help you to score different aspects of the agent. E.g. [RAGAS](https://docs.ragas.io/) for RAG agents or [LLM Guard](https://llm-guard.com/) to detect harmful language or prompt injection.
54 |
55 | In practice, a combination of these metrics gives the best coverage of an AI agent’s health. In this chapters [example notebook](https://huggingface.co/learn/agents-course/en/bonus-unit2/monitoring-and-evaluating-agents-notebook), we'll show you how these metrics looks in real examples but first, we'll learn how a typical evaluation workflow looks like.
56 |
57 | ## 👍 Evaluating AI Agents
58 |
59 | Observability gives us metrics, but evaluation is the process of analyzing that data (and performing tests) to determine how well an AI agent is performing and how it can be improved. In other words, once you have those traces and metrics, how do you use them to judge the agent and make decisions?
60 |
61 | Regular evaluation is important because AI agents are often non-deterministic and can evolve (through updates or drifting model behavior) – without evaluation, you wouldn’t know if your “smart agent” is actually doing its job well or if it’s regressed.
62 |
63 | There are two categories of evaluations for AI agents: **online evaluation** and **offline evaluation**. Both are valuable, and they complement each other. We usually begin with offline evaluation, as this is the minimum necessary step before deploying any agent.
64 |
65 | ### 🥷 Offline Evaluation
66 |
67 | ![Dataset items in Langfuse](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit2/example-dataset.png)
68 |
69 | This involves evaluating the agent in a controlled setting, typically using test datasets, not live user queries. You use curated datasets where you know what the expected output or correct behavior is, and then run your agent on those.
70 |
71 | For instance, if you built a math word-problem agent, you might have a [test dataset](https://huggingface.co/datasets/gsm8k) of 100 problems with known answers. Offline evaluation is often done during development (and can be part of CI/CD pipelines) to check improvements or guard against regressions. The benefit is that it’s **repeatable and you can get clear accuracy metrics since you have ground truth**. You might also simulate user queries and measure the agent’s responses against ideal answers or use automated metrics as described above.
72 |
73 | The key challenge with offline eval is ensuring your test dataset is comprehensive and stays relevant – the agent might perform well on a fixed test set but encounter very different queries in production. Therefore, you should keep test sets updated with new edge cases and examples that reflect real-world scenarios​. A mix of small “smoke test” cases and larger evaluation sets is useful: small sets for quick checks and larger ones for broader performance metrics​.
74 |
75 | ### 🔄 Online Evaluation
76 |
77 | This refers to evaluating the agent in a live, real-world environment, i.e. during actual usage in production. Online evaluation involves monitoring the agent’s performance on real user interactions and analyzing outcomes continuously.
78 |
79 | For example, you might track success rates, user satisfaction scores, or other metrics on live traffic. The advantage of online evaluation is that it **captures things you might not anticipate in a lab setting** – you can observe model drift over time (if the agent’s effectiveness degrades as input patterns shift) and catch unexpected queries or situations that weren’t in your test data​. It provides a true picture of how the agent behaves in the wild.
80 |
81 | Online evaluation often involves collecting implicit and explicit user feedback, as discussed, and possibly running shadow tests or A/B tests (where a new version of the agent runs in parallel to compare against the old). The challenge is that it can be tricky to get reliable labels or scores for live interactions – you might rely on user feedback or downstream metrics (like did the user click the result).
82 |
83 | ### 🤝 Combining the two
84 |
85 | In practice, successful AI agent evaluation blends **online** and **offline** methods​. You might run regular offline benchmarks to quantitatively score your agent on defined tasks and continuously monitor live usage to catch things the benchmarks miss. For example, offline tests can catch if a code-generation agent’s success rate on a known set of problems is improving, while online monitoring might alert you that users have started asking a new category of question that the agent struggles with. Combining both gives a more robust picture.
86 |
87 | In fact, many teams adopt a loop: _offline evaluation → deploy new agent version → monitor online metrics and collect new failure examples → add those examples to offline test set → iterate_. This way, evaluation is continuous and ever-improving.
88 |
89 | ## 🧑‍💻 Lets see how this works in practice
90 |
91 | In the next section, we'll see examples of how we can use observability tools to monitor and evaluate our agent.
92 |
93 |
94 |
--------------------------------------------------------------------------------
/units/en/communication/live1.mdx:
--------------------------------------------------------------------------------
1 | # Live 1: How the Course Works and First Q&A
2 |
3 | In this first live stream of the Agents Course, we explained how the course **works** (scope, units, challenges, and more) and answered your questions.
4 |
5 | <iframe width="560" height="315" src="https://www.youtube.com/embed/iLVyYDbdSmM?si=TCX5Ai3uZuKLXq45" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
6 |
7 | To know when the next live session is scheduled, check our **Discord server**. We will also send you an email. If you can’t participate, don’t worry, we **record all live sessions**.
8 |
--------------------------------------------------------------------------------
/units/en/communication/next-units.mdx:
--------------------------------------------------------------------------------
1 | # When will the next units be published?
2 |
3 | Here's the publication schedule:
4 |
5 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/next-units.jpg" alt="Next Units" width="100%"/>
6 |
7 | Don't forget to <a href="https://bit.ly/hf-learn-agents">sign up for the course</a>! By signing up, **we can send you the links as each unit is published, along with updates and details about upcoming challenges**.
8 |
9 | Keep Learning, Stay Awesome 🤗
--------------------------------------------------------------------------------
/units/en/unit0/discord101.mdx:
--------------------------------------------------------------------------------
1 | # (Optional) Discord 101 [[discord-101]]
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/discord-etiquette.jpg" alt="The Discord Etiquette" width="100%"/>
4 |
5 | This guide is designed to help you get started with Discord, a free chat platform popular in the gaming and ML communities.
6 |
7 | Join the Hugging Face Community Discord server, which **has over 100,000 members**, by clicking <a href="https://discord.gg/UrrTSsSyjb" target="_blank">here</a>. It's a great place to connect with others!
8 |
9 | ## The Agents course on Hugging Face's Discord Community
10 |
11 | Starting on Discord can be a bit overwhelming, so here's a quick guide to help you navigate.
12 |
13 | <!-- Not the case anymore, you'll be prompted to choose your interests. Be sure to select **"AI Agents"** to gain access to the AI Agents Category, which includes all the course-related channels. Feel free to explore and join additional channels if you wish! 🚀-->
14 |
15 | The HF Community Server hosts a vibrant community with interests in various areas, offering opportunities for learning through paper discussions, events, and more.
16 |
17 | After [signing up](http://hf.co/join/discord), introduce yourself in the `#introduce-yourself` channel.
18 |
19 | We created 4 channels for the Agents Course:
20 |
21 | - `agents-course-announcements`: for the **latest course informations**.
22 | - `🎓-agents-course-general`: for **general discussions and chitchat**.
23 | - `agents-course-questions`: to **ask questions and help your classmates**.
24 | - `agents-course-showcase`: to **show your best agents** .
25 |
26 | In addition you can check:
27 |
28 | - `smolagents`: for **discussion and support with the library**.
29 |
30 | ## Tips for using Discord effectively
31 |
32 | ### How to join a server
33 |
34 | If you are less familiar with Discord, you might want to check out this <a href="https://support.discord.com/hc/en-us/articles/360034842871-How-do-I-join-a-Server#h_01FSJF9GT2QJMS2PRAW36WNBS8" target="_blank">guide</a> on how to join a server.
35 |
36 | Here's a quick summary of the steps:
37 |
38 | 1. Click on the <a href="https://discord.gg/UrrTSsSyjb" target="_blank">Invite Link</a>.
39 | 2. Sign in with your Discord account, or create an account if you don't have one.
40 | 3. Validate that you are not an AI agent!
41 | 4. Setup your nickname and avatar.
42 | 5. Click "Join Server".
43 |
44 | ### How to use Discord effectively
45 |
46 | Here are a few tips for using Discord effectively:
47 |
48 | - **Voice channels** are available, though text chat is more commonly used.
49 | - You can format text using **markdown style**, which is especially useful for writing code. Note that markdown doesn't work as well for links.
50 | - Consider opening threads for **long conversations** to keep discussions organized.
51 |
52 | We hope you find this guide helpful! If you have any questions, feel free to ask us on Discord 🤗.
53 |
--------------------------------------------------------------------------------
/units/en/unit0/onboarding.mdx:
--------------------------------------------------------------------------------
1 | # Onboarding: Your First Steps ⛵
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/time-to-onboard.jpg" alt="Time to Onboard" width="100%"/>
4 |
5 | Now that you have all the details, let's get started! We're going to do four things:
6 |
7 | 1. **Create your Hugging Face Account** if it's not already done
8 | 2. **Sign up to Discord and introduce yourself** (don't be shy 🤗)
9 | 3. **Follow the Hugging Face Agents Course** on the Hub
10 | 4. **Spread the word** about the course
11 |
12 | ### Step 1: Create Your Hugging Face Account
13 |
14 | (If you haven't already) create a Hugging Face account <a href='https://huggingface.co/join' target='_blank'>here</a>.
15 |
16 | ### Step 2: Join Our Discord Community
17 |
18 | 👉🏻 Join our discord server <a href="https://discord.gg/UrrTSsSyjb" target="_blank">here.</a>
19 |
20 | When you join, remember to introduce yourself in `#introduce-yourself`.
21 |
22 | We have multiple AI Agent-related channels:
23 | - `agents-course-announcements`: for the **latest course information**.
24 | - `🎓-agents-course-general`: for **general discussions and chitchat**.
25 | - `agents-course-questions`: to **ask questions and help your classmates**.
26 | - `agents-course-showcase`: to **show your best agents**.
27 |
28 | In addition you can check:
29 |
30 | - `smolagents`: for **discussion and support with the library**.
31 |
32 | If this is your first time using Discord, we wrote a Discord 101 to get the best practices. Check [the next section](discord101).
33 |
34 | ### Step 3: Follow the Hugging Face Agent Course Organization
35 |
36 | Stay up to date with the latest course materials, updates, and announcements **by following the Hugging Face Agents Course Organization**.
37 |
38 | 👉 Go <a href="https://huggingface.co/agents-course" target="_blank">here</a> and click on **follow**.
39 |
40 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/hf_course_follow.gif" alt="Follow" width="100%"/>
41 |
42 | ### Step 4: Spread the word about the course
43 |
44 | Help us make this course more visible! There are two way you can help us:
45 |
46 | 1. Show your support by ⭐ <a href="https://github.com/huggingface/agents-course" target="_blank">the course's repository</a>.
47 |
48 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/please_star.gif" alt="Repo star"/>
49 |
50 | 2. Share Your Learning Journey: Let others **know you're taking this course**! We've prepared an illustration you can use in your social media posts
51 |
52 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png">
53 |
54 | You can download the image by clicking 👉 [here](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png?download=true)
55 |
56 | ### Step 5: Running Models Locally with Ollama (In case you run into Credit limits)
57 |
58 | 1. **Install Ollama**
59 |
60 | Follow the official Instructions <a href="https://ollama.com/download" target="_blank"> here.</a>
61 |
62 | 2. **Pull a model Locally**
63 | ``` bash
64 | ollama pull qwen2:7b #Check out ollama website for more models
65 | ```
66 | 3. **Start Ollama in the background (In one terminal)**
67 | ``` bash
68 | ollama serve
69 | ```
70 | 4. **Use `LiteLLMModel` Instead of `HfApiModel`**
71 | ``` bash
72 | from smolagents import LiteLLMModel
73 |
74 | model = LiteLLMModel(
75 | model_id="ollama_chat/qwen2:7b", # Or try other Ollama-supported models
76 | api_base="http://127.0.0.1:11434", # Default Ollama local server
77 | num_ctx=8192,
78 | )
79 | ```
80 |
81 | 5. **Why this works?**
82 | - Ollama serves models locally using an OpenAI-compatible API at `http://localhost:11434`.
83 | - `LiteLLMModel` is built to communicate with any model that supports the OpenAI chat/completion API format.
84 | - This means you can simply swap out `HfApiModel` for `LiteLLMModel` no other code changes required. It’s a seamless, plug-and-play solution.
85 |
86 | Congratulations! 🎉 **You've completed the onboarding process**! You're now ready to start learning about AI Agents. Have fun!
87 |
88 | Keep Learning, stay awesome 🤗
89 |
--------------------------------------------------------------------------------
/units/en/unit1/README.md:
--------------------------------------------------------------------------------
1 | # Table of Contents
2 |
3 | You can access Unit 1 on hf.co/learn 👉 <a href="https://hf.co/learn/agents-course/unit1/introduction">here</a>
4 |
5 | <!--
6 | | Title | Description |
7 | |-------|-------------|
8 | | [Definition of an Agent](1_definition_of_an_agent.md) | General example of what agents can do without technical jargon. |
9 | | [Explain LLMs](2_explain_llms.md) | Explanation of Large Language Models, including the family tree of models and suitable models for agents. |
10 | | [Messages and Special Tokens](3_messages_and_special_tokens.md) | Explanation of messages, special tokens, and chat-template usage. |
11 | | [Dummy Agent Library](4_dummy_agent_library.md) | Introduction to using a dummy agent library and serverless API. |
12 | | [Tools](5_tools.md) | Overview of Pydantic for agent tools and other common tool formats. |
13 | | [Agent Steps and Structure](6_agent_steps_and_structure.md) | Steps involved in an agent, including thoughts, actions, observations, and a comparison between code agents and JSON agents. |
14 | | [Thoughts](7_thoughts.md) | Explanation of thoughts and the ReAct approach. |
15 | | [Actions](8_actions.md) | Overview of actions and stop and parse approach. |
16 | | [Observations](9_observations.md) | Explanation of observations and append result to reflect. |
17 | | [Quizz](10_quizz.md) | Contains quizzes to test understanding of the concepts. |
18 | | [Simple Use Case](11_simple_use_case.md) | Provides a simple use case exercise using datetime and a Python function as a tool. |
19 | -->
--------------------------------------------------------------------------------
/units/en/unit1/actions.mdx:
--------------------------------------------------------------------------------
1 | # Actions: Enabling the Agent to Engage with Its Environment
2 |
3 | <Tip>
4 | In this section, we explore the concrete steps an AI agent takes to interact with its environment.
5 |
6 | We’ll cover how actions are represented (using JSON or code), the importance of the stop and parse approach, and introduce different types of agents.
7 | </Tip>
8 |
9 | Actions are the concrete steps an **AI agent takes to interact with its environment**.
10 |
11 | Whether it’s browsing the web for information or controlling a physical device, each action is a deliberate operation executed by the agent.
12 |
13 | For example, an agent assisting with customer service might retrieve customer data, offer support articles, or transfer issues to a human representative.
14 |
15 | ## Types of Agent Actions
16 |
17 | There are multiple types of Agents that take actions differently:
18 |
19 | | Type of Agent | Description |
20 | |------------------------|--------------------------------------------------------------------------------------------------|
21 | | JSON Agent | The Action to take is specified in JSON format. |
22 | | Code Agent | The Agent writes a code block that is interpreted externally. |
23 | | Function-calling Agent | It is a subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action. |
24 |
25 | Actions themselves can serve many purposes:
26 |
27 | | Type of Action | Description |
28 | |--------------------------|------------------------------------------------------------------------------------------|
29 | | Information Gathering | Performing web searches, querying databases, or retrieving documents. |
30 | | Tool Usage | Making API calls, running calculations, and executing code. |
31 | | Environment Interaction | Manipulating digital interfaces or controlling physical devices. |
32 | | Communication | Engaging with users via chat or collaborating with other agents. |
33 |
34 | One crucial part of an agent is the **ability to STOP generating new tokens when an action is complete**, and that is true for all formats of Agent: JSON, code, or function-calling. This prevents unintended output and ensures that the agent’s response is clear and precise.
35 |
36 | The LLM only handles text and uses it to describe the action it wants to take and the parameters to supply to the tool.
37 |
38 | ## The Stop and Parse Approach
39 |
40 | One key method for implementing actions is the **stop and parse approach**. This method ensures that the agent’s output is structured and predictable:
41 |
42 | 1. **Generation in a Structured Format**:
43 |
44 | The agent outputs its intended action in a clear, predetermined format (JSON or code).
45 |
46 | 2. **Halting Further Generation**:
47 |
48 | Once the action is complete, **the agent stops generating additional tokens**. This prevents extra or erroneous output.
49 |
50 | 3. **Parsing the Output**:
51 |
52 | An external parser reads the formatted action, determines which Tool to call, and extracts the required parameters.
53 |
54 | For example, an agent needing to check the weather might output:
55 |
56 |
57 | ```json
58 | Thought: I need to check the current weather for New York.
59 | Action :
60 | {
61 | "action": "get_weather",
62 | "action_input": {"location": "New York"}
63 | }
64 | ```
65 | The framework can then easily parse the name of the function to call and the arguments to apply.
66 |
67 | This clear, machine-readable format minimizes errors and enables external tools to accurately process the agent’s command.
68 |
69 | Note: Function-calling agents operate similarly by structuring each action so that a designated function is invoked with the correct arguments.
70 | We'll dive deeper into those types of Agents in a future Unit.
71 |
72 | ## Code Agents
73 |
74 | An alternative approach is using *Code Agents*.
75 | The idea is: **instead of outputting a simple JSON object**, a Code Agent generates an **executable code block—typically in a high-level language like Python**.
76 |
77 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/code-vs-json-actions.png" alt="Code Agents" />
78 |
79 | This approach offers several advantages:
80 |
81 | - **Expressiveness:** Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON.
82 | - **Modularity and Reusability:** Generated code can include functions and modules that are reusable across different actions or tasks.
83 | - **Enhanced Debuggability:** With a well-defined programming syntax, code errors are often easier to detect and correct.
84 | - **Direct Integration:** Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making.
85 |
86 | For example, a Code Agent tasked with fetching the weather might generate the following Python snippet:
87 |
88 | ```python
89 | # Code Agent Example: Retrieve Weather Information
90 | def get_weather(city):
91 | import requests
92 | api_url = f"https://api.weather.com/v1/location/{city}?apiKey=YOUR_API_KEY"
93 | response = requests.get(api_url)
94 | if response.status_code == 200:
95 | data = response.json()
96 | return data.get("weather", "No weather information available")
97 | else:
98 | return "Error: Unable to fetch weather data."
99 |
100 | # Execute the function and prepare the final answer
101 | result = get_weather("New York")
102 | final_answer = f"The current weather in New York is: {result}"
103 | print(final_answer)
104 | ```
105 |
106 | In this example, the Code Agent:
107 |
108 | - Retrieves weather data **via an API call**,
109 | - Processes the response,
110 | - And uses the print() function to output a final answer.
111 |
112 | This method **also follows the stop and parse approach** by clearly delimiting the code block and signaling when execution is complete (here, by printing the final_answer).
113 |
114 | ---
115 |
116 | We learned that Actions bridge an agent's internal reasoning and its real-world interactions by executing clear, structured tasks—whether through JSON, code, or function calls.
117 |
118 | This deliberate execution ensures that each action is precise and ready for external processing via the stop and parse approach. In the next section, we will explore Observations to see how agents capture and integrate feedback from their environment.
119 |
120 | After this, we will **finally be ready to build our first Agent!**
121 |
122 |
123 |
124 |
125 |
126 |
127 |
--------------------------------------------------------------------------------
/units/en/unit1/agent-steps-and-structure.mdx:
--------------------------------------------------------------------------------
1 | # Understanding AI Agents through the Thought-Action-Observation Cycle
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-check-3.jpg" alt="Unit 1 planning"/>
4 |
5 | In the previous sections, we learned:
6 |
7 | - **How tools are made available to the agent in the system prompt**.
8 | - **How AI agents are systems that can 'reason', plan, and interact with their environment**.
9 |
10 | In this section, **we’ll explore the complete AI Agent Workflow**, a cycle we defined as Thought-Action-Observation.
11 |
12 | And then, we’ll dive deeper on each of these steps.
13 |
14 |
15 | ## The Core Components
16 |
17 | Agents work in a continuous cycle of: **thinking (Thought) → acting (Act) and observing (Observe)**.
18 |
19 | Let’s break down these actions together:
20 |
21 | 1. **Thought**: The LLM part of the Agent decides what the next step should be.
22 | 2. **Action:** The agent takes an action, by calling the tools with the associated arguments.
23 | 3. **Observation:** The model reflects on the response from the tool.
24 |
25 | ## The Thought-Action-Observation Cycle
26 |
27 | The three components work together in a continuous loop. To use an analogy from programming, the agent uses a **while loop**: the loop continues until the objective of the agent has been fulfilled.
28 |
29 | Visually, it looks like this:
30 |
31 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AgentCycle.gif" alt="Think, Act, Observe cycle"/>
32 |
33 | In many Agent frameworks, **the rules and guidelines are embedded directly into the system prompt**, ensuring that every cycle adheres to a defined logic.
34 |
35 | In a simplified version, our system prompt may look like this:
36 |
37 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/system_prompt_cycle.png" alt="Think, Act, Observe cycle"/>
38 |
39 | We see here that in the System Message we defined :
40 |
41 | - The *Agent's behavior*.
42 | - The *Tools our Agent has access to*, as we described in the previous section.
43 | - The *Thought-Action-Observation Cycle*, that we bake into the LLM instructions.
44 |
45 | Let’s take a small example to understand the process before going deeper into each step of the process.
46 |
47 | ## Alfred, the weather Agent
48 |
49 | We created Alfred, the Weather Agent.
50 |
51 | A user asks Alfred: “What’s the current weather in New York?”
52 |
53 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent.jpg" alt="Alfred Agent"/>
54 |
55 | Alfred’s job is to answer this query using a weather API tool.
56 |
57 | Here’s how the cycle unfolds:
58 |
59 | ### Thought
60 |
61 | **Internal Reasoning:**
62 |
63 | Upon receiving the query, Alfred’s internal dialogue might be:
64 |
65 | *"The user needs current weather information for New York. I have access to a tool that fetches weather data. First, I need to call the weather API to get up-to-date details."*
66 |
67 | This step shows the agent breaking the problem into steps: first, gathering the necessary data.
68 |
69 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-1.jpg" alt="Alfred Agent"/>
70 |
71 | ### Action
72 |
73 | **Tool Usage:**
74 |
75 | Based on its reasoning and the fact that Alfred knows about a `get_weather` tool, Alfred prepares a JSON-formatted command that calls the weather API tool. For example, its first action could be:
76 |
77 | Thought: I need to check the current weather for New York.
78 |
79 | ```
80 | {
81 | "action": "get_weather",
82 | "action_input": {
83 | "location": "New York"
84 | }
85 | }
86 | ```
87 |
88 | Here, the action clearly specifies which tool to call (e.g., get_weather) and what parameter to pass (the “location": “New York”).
89 |
90 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-2.jpg" alt="Alfred Agent"/>
91 |
92 | ### Observation
93 |
94 | **Feedback from the Environment:**
95 |
96 | After the tool call, Alfred receives an observation. This might be the raw weather data from the API such as:
97 |
98 | *"Current weather in New York: partly cloudy, 15°C, 60% humidity."*
99 |
100 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-3.jpg" alt="Alfred Agent"/>
101 |
102 | This observation is then added to the prompt as additional context. It functions as real-world feedback, confirming whether the action succeeded and providing the needed details.
103 |
104 |
105 | ### Updated thought
106 |
107 | **Reflecting:**
108 |
109 | With the observation in hand, Alfred updates its internal reasoning:
110 |
111 | *"Now that I have the weather data for New York, I can compile an answer for the user."*
112 |
113 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-4.jpg" alt="Alfred Agent"/>
114 |
115 |
116 | ### Final Action
117 |
118 | Alfred then generates a final response formatted as we told it to:
119 |
120 | Thought: I have the weather data now. The current weather in New York is partly cloudy with a temperature of 15°C and 60% humidity."
121 |
122 | Final answer : The current weather in New York is partly cloudy with a temperature of 15°C and 60% humidity.
123 |
124 | This final action sends the answer back to the user, closing the loop.
125 |
126 |
127 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-5.jpg" alt="Alfred Agent"/>
128 |
129 |
130 | What we see in this example:
131 |
132 | - **Agents iterate through a loop until the objective is fulfilled:**
133 |
134 | **Alfred’s process is cyclical**. It starts with a thought, then acts by calling a tool, and finally observes the outcome. If the observation had indicated an error or incomplete data, Alfred could have re-entered the cycle to correct its approach.
135 |
136 | - **Tool Integration:**
137 |
138 | The ability to call a tool (like a weather API) enables Alfred to go **beyond static knowledge and retrieve real-time data**, an essential aspect of many AI Agents.
139 |
140 | - **Dynamic Adaptation:**
141 |
142 | Each cycle allows the agent to incorporate fresh information (observations) into its reasoning (thought), ensuring that the final answer is well-informed and accurate.
143 |
144 | This example showcases the core concept behind the *ReAct cycle* (a concept we're going to develop in the next section): **the interplay of Thought, Action, and Observation empowers AI agents to solve complex tasks iteratively**.
145 |
146 | By understanding and applying these principles, you can design agents that not only reason about their tasks but also **effectively utilize external tools to complete them**, all while continuously refining their output based on environmental feedback.
147 |
148 | ---
149 |
150 | Let’s now dive deeper into the Thought, Action, Observation as the individual steps of the process.
151 |
--------------------------------------------------------------------------------
/units/en/unit1/conclusion.mdx:
--------------------------------------------------------------------------------
1 | # Conclusion [[conclusion]]
2 |
3 | Congratulations on finishing this first Unit 🥳
4 |
5 | You've just **mastered the fundamentals of Agents** and you've created your first AI Agent!
6 |
7 | It's **normal if you still feel confused by some of these elements**. Agents are a complex topic and it's common to take a while to grasp everything.
8 |
9 | **Take time to really grasp the material** before continuing. It’s important to master these elements and have a solid foundation before entering the fun part.
10 |
11 | And if you pass the Quiz test, don't forget to get your certificate 🎓 👉 [here](https://huggingface.co/spaces/agents-course/unit1-certification-app)
12 |
13 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/certificate-example.jpg" alt="Certificate Example"/>
14 |
15 | In the next (bonus) unit, you're going to learn **to fine-tune a Agent to do function calling (aka to be able to call tools based on user prompt)**.
16 |
17 | Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
18 |
19 | ### Keep Learning, stay awesome 🤗
--------------------------------------------------------------------------------
/units/en/unit1/final-quiz.mdx:
--------------------------------------------------------------------------------
1 | # Unit 1 Quiz
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-unit1sub4DONE.jpg" alt="Unit 1 planning"/>
4 |
5 | Well done on working through the first unit! Let's test your understanding of the key concepts covered so far.
6 |
7 | When you pass the quiz, proceed to the next section to claim your certificate.
8 |
9 | Good luck!
10 |
11 | ## Quiz
12 |
13 | Here is the interactive quiz. The quiz is hosted on the Hugging Face Hub in a space. It will take you through a set of multiple choice questions to test your understanding of the key concepts covered in this unit. Once you've completed the quiz, you'll be able to see your score and a breakdown of the correct answers.
14 |
15 | One important thing: **don't forget to click on Submit after you passed, otherwise your exam score will not be saved!**
16 |
17 | <iframe
18 | src="https://agents-course-unit-1-quiz.hf.space"
19 | frameborder="0"
20 | width="850"
21 | height="450"
22 | ></iframe>
23 |
24 | You can also access the quiz 👉 [here](https://huggingface.co/spaces/agents-course/unit_1_quiz)
25 |
26 | ## Certificate
27 |
28 | Now that you have successfully passed the quiz, **you can get your certificate 🎓**
29 |
30 | When you complete the quiz, it will grant you access to a certificate of completion for this unit. You can download and share this certificate to showcase your progress in the course.
31 |
32 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-unit1sub5DONE.jpg" alt="Unit 1 planning"/>
33 |
34 | Once you receive your certificate, you can add it to your LinkedIn 🧑‍💼 or share it on X, Bluesky, etc. **We would be super proud and would love to congratulate you if you tag @huggingface**! 🤗
35 |
--------------------------------------------------------------------------------
/units/en/unit1/introduction.mdx:
--------------------------------------------------------------------------------
1 | # Introduction to Agents
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/thumbnail.jpg" alt="Thumbnail"/>
4 |
5 | Welcome to this first unit, where **you'll build a solid foundation in the fundamentals of AI Agents** including:
6 |
7 | - **Understanding Agents**
8 | - What is an Agent, and how does it work?
9 | - How do Agents make decisions using reasoning and planning?
10 |
11 | - **The Role of LLMs (Large Language Models) in Agents**
12 | - How LLMs serve as the “brain” behind an Agent.
13 | - How LLMs structure conversations via the Messages system.
14 |
15 | - **Tools and Actions**
16 | - How Agents use external tools to interact with the environment.
17 | - How to build and integrate tools for your Agent.
18 |
19 | - **The Agent Workflow:**
20 | - *Think* → *Act* → *Observe*.
21 |
22 | After exploring these topics, **you’ll build your first Agent** using `smolagents`!
23 |
24 | Your Agent, named Alfred, will handle a simple task and demonstrate how to apply these concepts in practice.
25 |
26 | You’ll even learn how to **publish your Agent on Hugging Face Spaces**, so you can share it with friends and colleagues.
27 |
28 | Finally, at the end of this Unit, you'll take a quiz. Pass it, and you'll **earn your first course certification**: the 🎓 Certificate of Fundamentals of Agents.
29 |
30 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/certificate-example.jpg" alt="Certificate Example"/>
31 |
32 | This Unit is your **essential starting point**, laying the groundwork for understanding Agents before you move on to more advanced topics.
33 |
34 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-no-check.jpg" alt="Unit 1 planning"/>
35 |
36 | It's a big unit, so **take your time** and don’t hesitate to come back to these sections from time to time.
37 |
38 | Ready? Let’s dive in! 🚀
39 |
--------------------------------------------------------------------------------
/units/en/unit1/observations.mdx:
--------------------------------------------------------------------------------
1 | # Observe: Integrating Feedback to Reflect and Adapt
2 |
3 | Observations are **how an Agent perceives the consequences of its actions**.
4 |
5 | They provide crucial information that fuels the Agent's thought process and guides future actions.
6 |
7 | They are **signals from the environment**—whether it’s data from an API, error messages, or system logs—that guide the next cycle of thought.
8 |
9 | In the observation phase, the agent:
10 |
11 | - **Collects Feedback:** Receives data or confirmation that its action was successful (or not).
12 | - **Appends Results:** Integrates the new information into its existing context, effectively updating its memory.
13 | - **Adapts its Strategy:** Uses this updated context to refine subsequent thoughts and actions.
14 |
15 | For example, if a weather API returns the data *"partly cloudy, 15°C, 60% humidity"*, this observation is appended to the agent’s memory (at the end of the prompt).
16 |
17 | The Agent then uses it to decide whether additional information is needed or if it’s ready to provide a final answer.
18 |
19 | This **iterative incorporation of feedback ensures the agent remains dynamically aligned with its goals**, constantly learning and adjusting based on real-world outcomes.
20 |
21 | These observations **can take many forms**, from reading webpage text to monitoring a robot arm's position. This can be seen like Tool "logs" that provide textual feedback of the Action execution.
22 |
23 | | Type of Observation | Example |
24 | |---------------------|---------------------------------------------------------------------------|
25 | | System Feedback | Error messages, success notifications, status codes |
26 | | Data Changes | Database updates, file system modifications, state changes |
27 | | Environmental Data | Sensor readings, system metrics, resource usage |
28 | | Response Analysis | API responses, query results, computation outputs |
29 | | Time-based Events | Deadlines reached, scheduled tasks completed |
30 |
31 | ## How Are the Results Appended?
32 |
33 | After performing an action, the framework follows these steps in order:
34 |
35 | 1. **Parse the action** to identify the function(s) to call and the argument(s) to use.
36 | 2. **Execute the action.**
37 | 3. **Append the result** as an **Observation**.
38 |
39 | ---
40 | We've now learned the Agent's Thought-Action-Observation Cycle.
41 |
42 | If some aspects still seem a bit blurry, don't worry—we'll revisit and deepen these concepts in future Units.
43 |
44 | Now, it's time to put your knowledge into practice by coding your very first Agent!
45 |
--------------------------------------------------------------------------------
/units/en/unit1/quiz1.mdx:
--------------------------------------------------------------------------------
1 | ### Q1: What is an Agent?
2 | Which of the following best describes an AI Agent?
3 |
4 | <Question
5 | choices={[
6 | {
7 | text: "An AI model that can reason, plan, and use tools to interact with its environment to achieve a specific goal.",
8 | explain: "This definition captures the essential characteristics of an Agent.",
9 | correct: true
10 | },
11 | {
12 | text: "A system that solely processes static text, without any inherent mechanism to interact dynamically with its surroundings or execute meaningful actions.",
13 | explain: "An Agent must be able to take an action and interact with its environment.",
14 | },
15 | {
16 | text: "A conversational agent restricted to answering queries, lacking the ability to perform any actions or interact with external systems.",
17 | explain: "A chatbot like this lacks the ability to take actions, making it different from an Agent.",
18 | },
19 | {
20 | text: "An online repository of information that offers static content without the capability to execute tasks or interact actively with users.",
21 | explain: "An Agent actively interacts with its environment rather than just providing static information.",
22 | }
23 | ]}
24 | />
25 |
26 | ---
27 |
28 | ### Q2: What is the Role of Planning in an Agent?
29 | Why does an Agent need to plan before taking an action?
30 |
31 | <Question
32 | choices={[
33 | {
34 | text: "To primarily store or recall past interactions, rather than mapping out a sequence of future actions.",
35 | explain: "Planning is about determining future actions, not storing past interactions.",
36 | },
37 | {
38 | text: "To decide on the sequence of actions and select appropriate tools needed to fulfill the user’s request.",
39 | explain: "Planning helps the Agent determine the best steps and tools to complete a task.",
40 | correct: true
41 | },
42 | {
43 | text: "To execute a sequence of arbitrary and uncoordinated actions that lack any defined strategy or intentional objective.",
44 | explain: "Planning ensures the Agent's actions are intentional and not random.",
45 | },
46 | {
47 | text: "To merely convert or translate text, bypassing any process of formulating a deliberate sequence of actions or employing strategic reasoning.",
48 | explain: "Planning is about structuring actions, not just converting text.",
49 | }
50 | ]}
51 | />
52 |
53 | ---
54 |
55 | ### Q3: How Do Tools Enhance an Agent's Capabilities?
56 | Why are tools essential for an Agent?
57 |
58 | <Question
59 | choices={[
60 | {
61 | text: "Tools serve no real purpose and do not contribute to the Agent’s ability to perform actions beyond basic text generation.",
62 | explain: "Tools expand an Agent's capabilities by allowing it to perform actions beyond text generation.",
63 | },
64 | {
65 | text: "Tools are solely designed for memory storage, lacking any capacity to facilitate the execution of tasks or enhance interactive performance.",
66 | explain: "Tools are primarily for performing actions, not just for storing data.",
67 | },
68 | {
69 | text: "Tools severely restrict the Agent exclusively to generating text, thereby preventing it from engaging in a broader range of interactive actions.",
70 | explain: "On the contrary, tools allow Agents to go beyond text-based responses.",
71 | },
72 | {
73 | text: "Tools provide the Agent with the ability to execute actions a text-generation model cannot perform natively, such as making coffee or generating images.",
74 | explain: "Tools enable Agents to interact with the real world and complete tasks.",
75 | correct: true
76 | }
77 | ]}
78 | />
79 |
80 | ---
81 |
82 | ### Q4: How Do Actions Differ from Tools?
83 | What is the key difference between Actions and Tools?
84 |
85 | <Question
86 | choices={[
87 | {
88 | text: "Actions are the steps the Agent takes, while Tools are external resources the Agent can use to perform those actions.",
89 | explain: "Actions are higher-level objectives, while Tools are specific functions the Agent can call upon.",
90 | correct: true
91 | },
92 | {
93 | text: "Actions and Tools are entirely identical components that can be used interchangeably, with no clear differences between them.",
94 | explain: "No, Actions are goals or tasks, while Tools are specific utilities the Agent uses to achieve them.",
95 | },
96 | {
97 | text: "Tools are considered broad utilities available for various functions, whereas Actions are mistakenly thought to be restricted only to physical interactions.",
98 | explain: "Not necessarily. Actions can involve both digital and physical tasks.",
99 | },
100 | {
101 | text: "Actions inherently require the use of LLMs to be determined and executed, whereas Tools are designed to function autonomously without such dependencies.",
102 | explain: "While LLMs help decide Actions, Actions themselves are not dependent on LLMs.",
103 | }
104 | ]}
105 | />
106 |
107 | ---
108 |
109 | ### Q5: What Role Do Large Language Models (LLMs) Play in Agents?
110 | How do LLMs contribute to an Agent’s functionality?
111 |
112 | <Question
113 | choices={[
114 | {
115 | text: "LLMs function merely as passive repositories that store information, lacking any capability to actively process input or produce dynamic responses.",
116 | explain: "LLMs actively process text input and generate responses, rather than just storing information.",
117 | },
118 | {
119 | text: "LLMs serve as the reasoning 'brain' of the Agent, processing text inputs to understand instructions and plan actions.",
120 | explain: "LLMs enable the Agent to interpret, plan, and decide on the next steps.",
121 | correct: true
122 | },
123 | {
124 | text: "LLMs are erroneously believed to be used solely for image processing, when in fact their primary function is to process and generate text.",
125 | explain: "LLMs primarily work with text, although they can sometimes interact with multimodal inputs.",
126 | },
127 | {
128 | text: "LLMs are considered completely irrelevant to the operation of AI Agents, implying that they are entirely superfluous in any practical application.",
129 | explain: "LLMs are a core component of modern AI Agents.",
130 | }
131 | ]}
132 | />
133 |
134 | ---
135 |
136 | ### Q6: Which of the Following Best Demonstrates an AI Agent?
137 | Which real-world example best illustrates an AI Agent at work?
138 |
139 | <Question
140 | choices={[
141 | {
142 | text: "A static FAQ page on a website that provides fixed information and lacks any interactive or dynamic response capabilities.",
143 | explain: "A static FAQ page does not interact dynamically with users or take actions.",
144 | },
145 | {
146 | text: "A simple calculator that performs arithmetic operations based on fixed rules, without any capability for reasoning or planning.",
147 | explain: "A calculator follows fixed rules without reasoning or planning, so it is not an Agent.",
148 | },
149 | {
150 | text: "A virtual assistant like Siri or Alexa that can understand spoken commands, reason through them, and perform tasks like setting reminders or sending messages.",
151 | explain: "This example includes reasoning, planning, and interaction with the environment.",
152 | correct: true
153 | },
154 | {
155 | text: "A video game NPC that operates on a fixed script of responses, without the ability to reason, plan, or use external tools.",
156 | explain: "Unless the NPC can reason, plan, and use tools, it does not function as an AI Agent.",
157 | }
158 | ]}
159 | />
160 |
161 | ---
162 |
163 | Congrats on finishing this Quiz 🥳! If you need to review any elements, take the time to revisit the chapter to reinforce your knowledge before diving deeper into the "Agent's brain": LLMs.
164 |
--------------------------------------------------------------------------------
/units/en/unit1/quiz2.mdx:
--------------------------------------------------------------------------------
1 | # Quick Self-Check (ungraded) [[quiz2]]
2 |
3 |
4 | What?! Another Quiz? We know, we know, ... 😅 But this short, ungraded quiz is here to **help you reinforce key concepts you've just learned**.
5 |
6 | This quiz covers Large Language Models (LLMs), message systems, and tools; essential components for understanding and building AI agents.
7 |
8 | ### Q1: Which of the following best describes an AI tool?
9 |
10 | <Question
11 | choices={[
12 | {
13 | text: "A process that only generates text responses",
14 | explain: "",
15 | },
16 | {
17 | text: "An executable process or external API that allows agents to perform specific tasks and interact with external environments",
18 | explain: "Tools are executable functions that agents can use to perform specific tasks and interact with external environments.",
19 | correct: true
20 | },
21 | {
22 | text: "A feature that stores agent conversations",
23 | explain: "",
24 | }
25 | ]}
26 | />
27 |
28 | ---
29 |
30 | ### Q2: How do AI agents use tools as a form of "acting" in an environment?
31 |
32 | <Question
33 | choices={[
34 | {
35 | text: "By passively waiting for user instructions",
36 | explain: "",
37 | },
38 | {
39 | text: "By only using pre-programmed responses",
40 | explain: "",
41 | },
42 | {
43 | text: "By asking the LLM to generate tool invocation code when appropriate and running tools on behalf of the model",
44 | explain: "Agents can invoke tools and use reasoning to plan and re-plan based on the information gained.",
45 | correct: true
46 | }
47 | ]}
48 | />
49 |
50 | ---
51 |
52 | ### Q3: What is a Large Language Model (LLM)?
53 |
54 | <Question
55 | choices={[
56 | {
57 | text: "A simple chatbot designed to respond with pre-defined answers",
58 | explain: "",
59 | },
60 | {
61 | text: "A deep learning model trained on large amounts of text to understand and generate human-like language",
62 | explain: "",
63 | correct: true
64 | },
65 | {
66 | text: "A rule-based AI that follows strict predefined commands",
67 | explain: "",
68 | }
69 | ]}
70 | />
71 |
72 | ---
73 |
74 | ### Q4: Which of the following best describes the role of special tokens in LLMs?
75 |
76 | <Question
77 | choices={[
78 | {
79 | text: "They are additional words stored in the model's vocabulary to enhance text generation quality",
80 | explain: "",
81 | },
82 | {
83 | text: "They serve specific functions like marking the end of a sequence (EOS) or separating different message roles in chat models",
84 | explain: "",
85 | correct: true
86 | },
87 | {
88 | text: "They are randomly inserted tokens used to improve response variability",
89 | explain: "",
90 | }
91 | ]}
92 | />
93 |
94 | ---
95 |
96 | ### Q5: How do AI chat models process user messages internally?
97 |
98 | <Question
99 | choices={[
100 | {
101 | text: "They directly interpret messages as structured commands with no transformations",
102 | explain: "",
103 | },
104 | {
105 | text: "They convert user messages into a formatted prompt by concatenating system, user, and assistant messages",
106 | explain: "",
107 | correct: true
108 | },
109 | {
110 | text: "They generate responses randomly based on previous conversations",
111 | explain: "",
112 | }
113 | ]}
114 | />
115 |
116 | ---
117 |
118 |
119 | Got it? Great! Now let's **dive into the complete Agent flow and start building your first AI Agent!**
120 |
--------------------------------------------------------------------------------
/units/en/unit1/thoughts.mdx:
--------------------------------------------------------------------------------
1 | # Thought: Internal Reasoning and the ReAct Approach
2 |
3 | <Tip>
4 | In this section, we dive into the inner workings of an AI agent—its ability to reason and plan. We’ll explore how the agent leverages its internal dialogue to analyze information, break down complex problems into manageable steps, and decide what action to take next. Additionally, we introduce the ReAct approach, a prompting technique that encourages the model to think “step by step” before acting.
5 | </Tip>
6 |
7 | Thoughts represent the **Agent's internal reasoning and planning processes** to solve the task.
8 |
9 | This utilises the agent's Large Language Model (LLM) capacity **to analyze information when presented in its prompt**.
10 |
11 | Think of it as the agent's internal dialogue, where it considers the task at hand and strategizes its approach.
12 |
13 | The Agent's thoughts are responsible for accessing current observations and decide what the next action(s) should be.
14 |
15 | Through this process, the agent can **break down complex problems into smaller, more manageable steps**, reflect on past experiences, and continuously adjust its plans based on new information.
16 |
17 | Here are some examples of common thoughts:
18 |
19 | | Type of Thought | Example |
20 | |----------------|---------|
21 | | Planning | "I need to break this task into three steps: 1) gather data, 2) analyze trends, 3) generate report" |
22 | | Analysis | "Based on the error message, the issue appears to be with the database connection parameters" |
23 | | Decision Making | "Given the user's budget constraints, I should recommend the mid-tier option" |
24 | | Problem Solving | "To optimize this code, I should first profile it to identify bottlenecks" |
25 | | Memory Integration | "The user mentioned their preference for Python earlier, so I'll provide examples in Python" |
26 | | Self-Reflection | "My last approach didn't work well, I should try a different strategy" |
27 | | Goal Setting | "To complete this task, I need to first establish the acceptance criteria" |
28 | | Prioritization | "The security vulnerability should be addressed before adding new features" |
29 |
30 | > **Note:** In the case of LLMs fine-tuned for function-calling, the thought process is optional.
31 | > *In case you're not familiar with function-calling, there will be more details in the Actions section.*
32 |
33 | ## The ReAct Approach
34 |
35 | A key method is the **ReAct approach**, which is the concatenation of "Reasoning" (Think) with "Acting" (Act).
36 |
37 | ReAct is a simple prompting technique that appends "Let's think step by step" before letting the LLM decode the next tokens.
38 |
39 | Indeed, prompting the model to think "step by step" encourages the decoding process toward next tokens **that generate a plan**, rather than a final solution, since the model is encouraged to **decompose** the problem into *sub-tasks*.
40 |
41 | This allows the model to consider sub-steps in more detail, which in general leads to less errors than trying to generate the final solution directly.
42 |
43 | <figure>
44 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/ReAct.png" alt="ReAct"/>
45 | <figcaption>The (d) is an example of ReAct approach where we prompt "Let's think step by step"
46 | </figcaption>
47 | </figure>
48 |
49 | <Tip>
50 | We have recently seen a lot of interest for reasoning strategies. This is what's behind models like Deepseek R1 or OpenAI's o1, which have been fine-tuned to "think before answering".
51 |
52 | These models have been trained to always include specific _thinking_ sections (enclosed between `<think>` and `</think>` special tokens). This is not just a prompting technique like ReAct, but a training method where the model learns to generate these sections after analyzing thousands of examples that show what we expect it to do.
53 | </Tip>
54 |
55 | ---
56 | Now that we better understand the Thought process, let's go deeper on the second part of the process: Act.
57 |
--------------------------------------------------------------------------------
/units/en/unit1/tutorial.mdx:
--------------------------------------------------------------------------------
1 | # Let's Create Our First Agent Using smolagents
2 |
3 | In the last section, we learned how we can create Agents from scratch using Python code, and we **saw just how tedious that process can be**. Fortunately, many Agent libraries simplify this work by **handling much of the heavy lifting for you**.
4 |
5 | In this tutorial, **you'll create your very first Agent** capable of performing actions such as image generation, web search, time zone checking and much more!
6 |
7 | You will also publish your agent **on a Hugging Face Space so you can share it with friends and colleagues**.
8 |
9 | Let's get started!
10 |
11 |
12 | ## What is smolagents?
13 |
14 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/smolagents.png" alt="smolagents"/>
15 |
16 | To make this Agent, we're going to use `smolagents`, a library that **provides a framework for developing your agents with ease**.
17 |
18 | This lightweight library is designed for simplicity, but it abstracts away much of the complexity of building an Agent, allowing you to focus on designing your agent's behavior.
19 |
20 | We're going to get deeper into smolagents in the next Unit. Meanwhile, you can also check this <a href="https://huggingface.co/blog/smolagents" target="_blank">blog post</a> or the library's <a href="https://github.com/huggingface/smolagents" target="_blank">repo in GitHub</a>.
21 |
22 | In short, `smolagents` is a library that focuses on **codeAgent**, a kind of agent that performs **"Actions"** through code blocks, and then **"Observes"** results by executing the code.
23 |
24 | Here is an example of what we'll build!
25 |
26 | We provided our agent with an **Image generation tool** and asked it to generate an image of a cat.
27 |
28 | The agent inside `smolagents` is going to have the **same behaviors as the custom one we built previously**: it's going **to think, act and observe in cycle** until it reaches a final answer:
29 |
30 | <iframe width="560" height="315" src="https://www.youtube.com/embed/PQDKcWiuln4?si=ysSTDZoi8y55FVvA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
31 |
32 | Exciting, right?
33 |
34 | ## Let's build our Agent!
35 |
36 | To start, duplicate this Space: <a href="https://huggingface.co/spaces/agents-course/First_agent_template" target="_blank">https://huggingface.co/spaces/agents-course/First_agent_template</a>
37 | > Thanks to <a href="https://huggingface.co/m-ric" target="_blank">Aymeric</a> for this template! 🙌
38 |
39 |
40 | Duplicating this space means **creating a local copy on your own profile**:
41 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/duplicate-space.gif" alt="Duplicate"/>
42 |
43 | After duplicating the Space, you'll need to add your Hugging Face API token so your agent can access the model API:
44 |
45 | 1. First, get your Hugging Face token from [https://hf.co/settings/tokens](https://hf.co/settings/tokens) with permission for inference, if you don't already have one
46 | 2. Go to your duplicated Space and click on the **Settings** tab
47 | 3. Scroll down to the **Variables and Secrets** section and click **New Secret**
48 | 4. Create a secret with the name `HF_TOKEN` and paste your token as the value
49 | 5. Click **Save** to store your token securely
50 |
51 | Throughout this lesson, the only file you will need to modify is the (currently incomplete) **"app.py"**. You can see here the [original one in the template](https://huggingface.co/spaces/agents-course/First_agent_template/blob/main/app.py). To find yours, go to your copy of the space, then click the `Files` tab and then on `app.py` in the directory listing.
52 |
53 | Let's break down the code together:
54 |
55 | - The file begins with some simple but necessary library imports
56 |
57 | ```python
58 | from smolagents import CodeAgent, DuckDuckGoSearchTool, FinalAnswerTool, HfApiModel, load_tool, tool
59 | import datetime
60 | import requests
61 | import pytz
62 | import yaml
63 | ```
64 |
65 | As outlined earlier, we will directly use the **CodeAgent** class from **smolagents**.
66 |
67 |
68 | ### The Tools
69 |
70 | Now let's get into the tools! If you want a refresher about tools, don't hesitate to go back to the [Tools](tools) section of the course.
71 |
72 | ```python
73 | @tool
74 | def my_custom_tool(arg1:str, arg2:int)-> str: # it's important to specify the return type
75 | # Keep this format for the tool description / args description but feel free to modify the tool
76 | """A tool that does nothing yet
77 | Args:
78 | arg1: the first argument
79 | arg2: the second argument
80 | """
81 | return "What magic will you build ?"
82 |
83 | @tool
84 | def get_current_time_in_timezone(timezone: str) -> str:
85 | """A tool that fetches the current local time in a specified timezone.
86 | Args:
87 | timezone: A string representing a valid timezone (e.g., 'America/New_York').
88 | """
89 | try:
90 | # Create timezone object
91 | tz = pytz.timezone(timezone)
92 | # Get current time in that timezone
93 | local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
94 | return f"The current local time in {timezone} is: {local_time}"
95 | except Exception as e:
96 | return f"Error fetching time for timezone '{timezone}': {str(e)}"
97 | ```
98 |
99 |
100 | The Tools are what we are encouraging you to build in this section! We give you two examples:
101 |
102 | 1. A **non-working dummy Tool** that you can modify to make something useful.
103 | 2. An **actually working Tool** that gets the current time somewhere in the world.
104 |
105 | To define your tool it is important to:
106 |
107 | 1. Provide input and output types for your function, like in `get_current_time_in_timezone(timezone: str) -> str:`
108 | 2. **A well formatted docstring**. `smolagents` is expecting all the arguments to have a **textual description in the docstring**.
109 |
110 | ### The Agent
111 |
112 | It uses [`Qwen/Qwen2.5-Coder-32B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) as the LLM engine. This is a very capable model that we'll access via the serverless API.
113 |
114 | ```python
115 | final_answer = FinalAnswerTool()
116 | model = HfApiModel(
117 | max_tokens=2096,
118 | temperature=0.5,
119 | model_id='Qwen/Qwen2.5-Coder-32B-Instruct',
120 | custom_role_conversions=None,
121 | )
122 |
123 | with open("prompts.yaml", 'r') as stream:
124 | prompt_templates = yaml.safe_load(stream)
125 |
126 | # We're creating our CodeAgent
127 | agent = CodeAgent(
128 | model=model,
129 | tools=[final_answer], # add your tools here (don't remove final_answer)
130 | max_steps=6,
131 | verbosity_level=1,
132 | grammar=None,
133 | planning_interval=None,
134 | name=None,
135 | description=None,
136 | prompt_templates=prompt_templates
137 | )
138 |
139 | GradioUI(agent).launch()
140 | ```
141 |
142 | This Agent still uses the `InferenceClient` we saw in an earlier section behind the **HfApiModel** class!
143 |
144 | We will give more in-depth examples when we present the framework in Unit 2. For now, you need to focus on **adding new tools to the list of tools** using the `tools` parameter of your Agent.
145 |
146 | For example, you could use the `DuckDuckGoSearchTool` that was imported in the first line of the code, or you can examine the `image_generation_tool` that is loaded from the Hub later in the code.
147 |
148 | **Adding tools will give your agent new capabilities**, try to be creative here!
149 |
150 | ### The System Prompt
151 |
152 | The agent's system prompt is stored in a seperate `prompts.yaml` file. This file contains predefined instructions that guide the agent's behavior.
153 |
154 | Storing prompts in a YAML file allows for easy customization and reuse across different agents or use cases.
155 |
156 | You can check the [Space's file structure](https://huggingface.co/spaces/agents-course/First_agent_template/tree/main) to see where the `prompts.yaml` file is located and how it's organized within the project.
157 |
158 | The complete "app.py":
159 |
160 | ```python
161 | from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel, load_tool, tool
162 | import datetime
163 | import requests
164 | import pytz
165 | import yaml
166 | from tools.final_answer import FinalAnswerTool
167 |
168 | from Gradio_UI import GradioUI
169 |
170 | # Below is an example of a tool that does nothing. Amaze us with your creativity!
171 | @tool
172 | def my_custom_tool(arg1:str, arg2:int)-> str: # it's important to specify the return type
173 | # Keep this format for the tool description / args description but feel free to modify the tool
174 | """A tool that does nothing yet
175 | Args:
176 | arg1: the first argument
177 | arg2: the second argument
178 | """
179 | return "What magic will you build ?"
180 |
181 | @tool
182 | def get_current_time_in_timezone(timezone: str) -> str:
183 | """A tool that fetches the current local time in a specified timezone.
184 | Args:
185 | timezone: A string representing a valid timezone (e.g., 'America/New_York').
186 | """
187 | try:
188 | # Create timezone object
189 | tz = pytz.timezone(timezone)
190 | # Get current time in that timezone
191 | local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
192 | return f"The current local time in {timezone} is: {local_time}"
193 | except Exception as e:
194 | return f"Error fetching time for timezone '{timezone}': {str(e)}"
195 |
196 |
197 | final_answer = FinalAnswerTool()
198 | model = HfApiModel(
199 | max_tokens=2096,
200 | temperature=0.5,
201 | model_id='Qwen/Qwen2.5-Coder-32B-Instruct',
202 | custom_role_conversions=None,
203 | )
204 |
205 |
206 | # Import tool from Hub
207 | image_generation_tool = load_tool("agents-course/text-to-image", trust_remote_code=True)
208 |
209 | # Load system prompt from prompt.yaml file
210 | with open("prompts.yaml", 'r') as stream:
211 | prompt_templates = yaml.safe_load(stream)
212 |
213 | agent = CodeAgent(
214 | model=model,
215 | tools=[final_answer], # add your tools here (don't remove final_answer)
216 | max_steps=6,
217 | verbosity_level=1,
218 | grammar=None,
219 | planning_interval=None,
220 | name=None,
221 | description=None,
222 | prompt_templates=prompt_templates # Pass system prompt to CodeAgent
223 | )
224 |
225 |
226 | GradioUI(agent).launch()
227 | ```
228 |
229 | Your **Goal** is to get familiar with the Space and the Agent.
230 |
231 | Currently, the agent in the template **does not use any tools, so try to provide it with some of the pre-made ones or even make some new tools yourself!**
232 |
233 | We are eagerly waiting for your amazing agents output in the discord channel **#agents-course-showcase**!
234 |
235 |
236 | ---
237 | Congratulations, you've built your first Agent! Don't hesitate to share it with your friends and colleagues.
238 |
239 | Since this is your first try, it's perfectly normal if it's a little buggy or slow. In future units, we'll learn how to build even better Agents.
240 |
241 | The best way to learn is to try, so don't hesitate to update it, add more tools, try with another model, etc.
242 |
243 | In the next section, you're going to fill the final Quiz and get your certificate!
244 |
245 |
--------------------------------------------------------------------------------
/units/en/unit1/what-are-agents.mdx:
--------------------------------------------------------------------------------
1 | # What is an Agent?
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-no-check.jpg" alt="Unit 1 planning"/>
4 |
5 | By the end of this section, you'll feel comfortable with the concept of agents and their various applications in AI.
6 |
7 | To explain what an Agent is, let's start with an analogy.
8 |
9 | ## The Big Picture: Alfred The Agent
10 |
11 | Meet Alfred. Alfred is an **Agent**.
12 |
13 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/this-is-alfred.jpg" alt="This is Alfred"/>
14 |
15 | Imagine Alfred **receives a command**, such as: "Alfred, I would like a coffee please."
16 |
17 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/coffee-please.jpg" alt="I would like a coffee"/>
18 |
19 | Because Alfred **understands natural language**, he quickly grasps our request.
20 |
21 | Before fulfilling the order, Alfred engages in **reasoning and planning**, figuring out the steps and tools he needs to:
22 |
23 | 1. Go to the kitchen
24 | 2. Use the coffee machine
25 | 3. Brew the coffee
26 | 4. Bring the coffee back
27 |
28 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/reason-and-plan.jpg" alt="Reason and plan"/>
29 |
30 | Once he has a plan, he **must act**. To execute his plan, **he can use tools from the list of tools he knows about**.
31 |
32 | In this case, to make a coffee, he uses a coffee machine. He activates the coffee machine to brew the coffee.
33 |
34 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/make-coffee.jpg" alt="Make coffee"/>
35 |
36 | Finally, Alfred brings the freshly brewed coffee to us.
37 |
38 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/bring-coffee.jpg" alt="Bring coffee"/>
39 |
40 | And this is what an Agent is: an **AI model capable of reasoning, planning, and interacting with its environment**.
41 |
42 | We call it Agent because it has _agency_, aka it has the ability to interact with the environment.
43 |
44 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/process.jpg" alt="Agent process"/>
45 |
46 | ## Let's go more formal
47 |
48 | Now that you have the big picture, here’s a more precise definition:
49 |
50 | > An Agent is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks.
51 |
52 | Think of the Agent as having two main parts:
53 |
54 | 1. **The Brain (AI Model)**
55 |
56 | This is where all the thinking happens. The AI model **handles reasoning and planning**.
57 | It decides **which Actions to take based on the situation**.
58 |
59 | 2. **The Body (Capabilities and Tools)**
60 |
61 | This part represents **everything the Agent is equipped to do**.
62 |
63 | The **scope of possible actions** depends on what the agent **has been equipped with**. For example, because humans lack wings, they can't perform the "fly" **Action**, but they can execute **Actions** like "walk", "run" ,"jump", "grab", and so on.
64 |
65 | ### The spectrum of "Agency"
66 |
67 | Following this definition, Agents exist on a continuous spectrum of increasing agency:
68 |
69 | | Agency Level | Description | What that's called | Example pattern |
70 | | --- | --- | --- | --- |
71 | | ☆☆☆ | Agent output has no impact on program flow | Simple processor | `process_llm_output(llm_response)` |
72 | | ★☆☆ | Agent output determines basic control flow | Router | `if llm_decision(): path_a() else: path_b()` |
73 | | ★★☆ | Agent output determines function execution | Tool caller | `run_function(llm_chosen_tool, llm_chosen_args)` |
74 | | ★★★ | Agent output controls iteration and program continuation | Multi-step Agent | `while llm_should_continue(): execute_next_step()` |
75 | | ★★★ | One agentic workflow can start another agentic workflow | Multi-Agent | `if llm_trigger(): execute_agent()` |
76 |
77 | Table from [smolagents conceptual guide](https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents).
78 |
79 |
80 | ## What type of AI Models do we use for Agents?
81 |
82 | The most common AI model found in Agents is an LLM (Large Language Model), which takes **Text** as an input and outputs **Text** as well.
83 |
84 | Well known examples are **GPT4** from **OpenAI**, **LLama** from **Meta**, **Gemini** from **Google**, etc. These models have been trained on a vast amount of text and are able to generalize well. We will learn more about LLMs in the [next section](what-are-llms).
85 |
86 | <Tip>
87 | It's also possible to use models that accept other inputs as the Agent's core model. For example, a Vision Language Model (VLM), which is like an LLM but also understands images as input. We'll focus on LLMs for now and will discuss other options later.
88 | </Tip>
89 |
90 | ## How does an AI take action on its environment?
91 |
92 | LLMs are amazing models, but **they can only generate text**.
93 |
94 | However, if you ask a well-known chat application like HuggingChat or ChatGPT to generate an image, they can! How is that possible?
95 |
96 | The answer is that the developers of HuggingChat, ChatGPT and similar apps implemented additional functionality (called **Tools**), that the LLM can use to create images.
97 |
98 | <figure>
99 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/eiffel_brocolis.jpg" alt="Eiffel Brocolis"/>
100 | <figcaption>The model used an Image Generation Tool to generate this image.
101 | </figcaption>
102 | </figure>
103 |
104 | We will learn more about tools in the [Tools](tools) section.
105 |
106 | ## What type of tasks can an Agent do?
107 |
108 | An Agent can perform any task we implement via **Tools** to complete **Actions**.
109 |
110 | For example, if I write an Agent to act as my personal assistant (like Siri) on my computer, and I ask it to "send an email to my Manager asking to delay today's meeting", I can give it some code to send emails. This will be a new Tool the Agent can use whenever it needs to send an email. We can write it in Python:
111 |
112 | ```python
113 | def send_message_to(recipient, message):
114 | """Useful to send an e-mail message to a recipient"""
115 | ...
116 | ```
117 |
118 | The LLM, as we'll see, will generate code to run the tool when it needs to, and thus fulfill the desired task.
119 |
120 | ```python
121 | send_message_to("Manager", "Can we postpone today's meeting?")
122 | ```
123 |
124 | The **design of the Tools is very important and has a great impact on the quality of your Agent**. Some tasks will require very specific Tools to be crafted, while others may be solved with general purpose tools like "web_search".
125 |
126 | > Note that **Actions are not the same as Tools**. An Action, for instance, can involve the use of multiple Tools to complete.
127 |
128 | Allowing an agent to interact with its environment **allows real-life usage for companies and individuals**.
129 |
130 | ### Example 1: Personal Virtual Assistants
131 |
132 | Virtual assistants like Siri, Alexa, or Google Assistant, work as agents when they interact on behalf of users using their digital environments.
133 |
134 | They take user queries, analyze context, retrieve information from databases, and provide responses or initiate actions (like setting reminders, sending messages, or controlling smart devices).
135 |
136 | ### Example 2: Customer Service Chatbots
137 |
138 | Many companies deploy chatbots as agents that interact with customers in natural language.
139 |
140 | These agents can answer questions, guide users through troubleshooting steps, open issues in internal databases, or even complete transactions.
141 |
142 | Their predefined objectives might include improving user satisfaction, reducing wait times, or increasing sales conversion rates. By interacting directly with customers, learning from the dialogues, and adapting their responses over time, they demonstrate the core principles of an agent in action.
143 |
144 |
145 | ### Example 3: AI Non-Playable Character in a video game
146 |
147 | AI agents powered by LLMs can make Non-Playable Characters (NPCs) more dynamic and unpredictable.
148 |
149 | Instead of following rigid behavior trees, they can **respond contextually, adapt to player interactions**, and generate more nuanced dialogue. This flexibility helps create more lifelike, engaging characters that evolve alongside the player’s actions.
150 |
151 | ---
152 |
153 | To summarize, an Agent is a system that uses an AI Model (typically an LLM) as its core reasoning engine, to:
154 |
155 | - **Understand natural language:** Interpret and respond to human instructions in a meaningful way.
156 |
157 | - **Reason and plan:** Analyze information, make decisions, and devise strategies to solve problems.
158 |
159 | - **Interact with its environment:** Gather information, take actions, and observe the results of those actions.
160 |
161 | Now that you have a solid grasp of what Agents are, let’s reinforce your understanding with a short, ungraded quiz. After that, we’ll dive into the “Agent’s brain”: the [LLMs](what-are-llms).
162 |
--------------------------------------------------------------------------------
/units/en/unit2/introduction.mdx:
--------------------------------------------------------------------------------
1 | # Introduction to Agentic Frameworks
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/thumbnail.jpg" alt="Thumbnail"/>
4 |
5 | Welcome to this second unit, where **we'll explore different agentic frameworks** that can be used to build powerful agentic applications.
6 |
7 | We will study:
8 |
9 | - In Unit 2.1: [smolagents](https://huggingface.co/docs/smolagents/en/index)
10 | - In Unit 2.2: [LlamaIndex](https://www.llamaindex.ai/)
11 | - In Unit 2.3: [LangGraph](https://www.langchain.com/langgraph)
12 |
13 | Let's dive in! 🕵
14 |
15 | ## When to Use an Agentic Framework
16 |
17 | An agentic framework is **not always needed when building an application around LLMs**. They provide flexibility in the workflow to efficiently solve a specific task, but they're not always necessary.
18 |
19 | Sometimes, **predefined workflows are sufficient** to fulfill user requests, and there is no real need for an agentic framework. If the approach to build an agent is simple, like a chain of prompts, using plain code may be enough. The advantage is that the developer will have **full control and understanding of their system without abstractions**.
20 |
21 | However, when the workflow becomes more complex, such as letting an LLM call functions or using multiple agents, these abstractions start to become helpful.
22 |
23 | Considering these ideas, we can already identify the need for some features:
24 |
25 | * An *LLM engine* that powers the system.
26 | * A *list of tools* the agent can access.
27 | * A *parser* for extracting tool calls from the LLM output.
28 | * A *system prompt* synced with the parser.
29 | * A *memory system*.
30 | * *Error logging and retry mechanisms* to control LLM mistakes.
31 | We'll explore how these topics are resolved in various frameworks, including `smolagents`, `LlamaIndex`, and `LangGraph`.
32 |
33 | ## Agentic Frameworks Units
34 |
35 | | Framework | Description | Unit Author |
36 | |------------|----------------|----------------|
37 | | [smolagents](./smolagents/introduction) | Agents framework developed by Hugging Face. | Sergio Paniego - [HF](https://huggingface.co/sergiopaniego) - [X](https://x.com/sergiopaniego) - [Linkedin](https://www.linkedin.com/in/sergio-paniego-blanco) |
38 | | [Llama-Index](./llama-index/introduction) |End-to-end tooling to ship a context-augmented AI agent to production | David Berenstein - [HF](https://huggingface.co/davidberenstein1957) - [X](https://x.com/davidberenstei) - [Linkedin](https://www.linkedin.com/in/davidberenstein) |
39 | | [LangGraph](./langgraph/introduction) | Agents allowing stateful orchestration of agents | Joffrey THOMAS - [HF](https://huggingface.co/Jofthomas) - [X](https://x.com/Jthmas404) - [Linkedin](https://www.linkedin.com/in/joffrey-thomas) |
40 |
--------------------------------------------------------------------------------
/units/en/unit2/langgraph/building_blocks.mdx:
--------------------------------------------------------------------------------
1 | # Building Blocks of LangGraph
2 |
3 | To build applications with LangGraph, you need to understand its core components. Let's explore the fundamental building blocks that make up a LangGraph application.
4 |
5 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Building_blocks.png" alt="Building Blocks" width="70%"/>
6 |
7 | An application in LangGraph starts from an **entrypoint**, and depending on the execution, the flow may go to one function or another until it reaches the END.
8 |
9 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/application.png" alt="Application"/>
10 |
11 | ## 1. State
12 |
13 | **State** is the central concept in LangGraph. It represents all the information that flows through your application.
14 |
15 | ```python
16 | from typing_extensions import TypedDict
17 |
18 | class State(TypedDict):
19 | graph_state: str
20 | ```
21 |
22 | The state is **User defined**, hence the fields should carefully be crafted to contain all data needed for decision-making process!
23 |
24 | > 💡 **Tip:** Think carefully about what information your application needs to track between steps.
25 |
26 | ## 2. Nodes
27 |
28 | **Nodes** are python functions. Each node:
29 | - Takes the state as input
30 | - Performs some operation
31 | - Returns updates to the state
32 |
33 | ```python
34 | def node_1(state):
35 | print("---Node 1---")
36 | return {"graph_state": state['graph_state'] +" I am"}
37 |
38 | def node_2(state):
39 | print("---Node 2---")
40 | return {"graph_state": state['graph_state'] +" happy!"}
41 |
42 | def node_3(state):
43 | print("---Node 3---")
44 | return {"graph_state": state['graph_state'] +" sad!"}
45 | ```
46 |
47 | For example, Nodes can contain:
48 | - **LLM calls**: Generate text or make decisions
49 | - **Tool calls**: Interact with external systems
50 | - **Conditional logic**: Determine next steps
51 | - **Human intervention**: Get input from users
52 |
53 | > 💡 **Info:** Some nodes necessary for the whole workflow like START and END exist from langGraph directly.
54 |
55 |
56 | ## 3. Edges
57 |
58 | **Edges** connect nodes and define the possible paths through your graph:
59 |
60 | ```python
61 | import random
62 | from typing import Literal
63 |
64 | def decide_mood(state) -> Literal["node_2", "node_3"]:
65 |
66 | # Often, we will use state to decide on the next node to visit
67 | user_input = state['graph_state']
68 |
69 | # Here, let's just do a 50 / 50 split between nodes 2, 3
70 | if random.random() < 0.5:
71 |
72 | # 50% of the time, we return Node 2
73 | return "node_2"
74 |
75 | # 50% of the time, we return Node 3
76 | return "node_3"
77 | ```
78 |
79 | Edges can be:
80 | - **Direct**: Always go from node A to node B
81 | - **Conditional**: Choose the next node based on the current state
82 |
83 | ## 4. StateGraph
84 |
85 | The **StateGraph** is the container that holds your entire agent workflow:
86 |
87 | ```python
88 | from IPython.display import Image, display
89 | from langgraph.graph import StateGraph, START, END
90 |
91 | # Build graph
92 | builder = StateGraph(State)
93 | builder.add_node("node_1", node_1)
94 | builder.add_node("node_2", node_2)
95 | builder.add_node("node_3", node_3)
96 |
97 | # Logic
98 | builder.add_edge(START, "node_1")
99 | builder.add_conditional_edges("node_1", decide_mood)
100 | builder.add_edge("node_2", END)
101 | builder.add_edge("node_3", END)
102 |
103 | # Add
104 | graph = builder.compile()
105 | ```
106 |
107 | Which can then be visualized!
108 | ```python
109 | # View
110 | display(Image(graph.get_graph().draw_mermaid_png()))
111 | ```
112 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/basic_graph.jpeg" alt="Graph Visualization"/>
113 |
114 | But most importantly, invoked:
115 | ```python
116 | graph.invoke({"graph_state" : "Hi, this is Lance."})
117 | ```
118 | output :
119 | ```
120 | ---Node 1---
121 | ---Node 3---
122 | {'graph_state': 'Hi, this is Lance. I am sad!'}
123 | ```
124 |
125 | ## What's Next?
126 |
127 | In the next section, we'll put these concepts into practice by building our first graph. This graph lets Alfred take in your e-mails, classify them, and craft a preliminary answer if they are genuine.
128 |
--------------------------------------------------------------------------------
/units/en/unit2/langgraph/conclusion.mdx:
--------------------------------------------------------------------------------
1 | # Conclusion
2 |
3 | Congratulations on finishing the `LangGraph` module of this second Unit! 🥳
4 |
5 | You've now mastered the fundamentals of building structured workflows with LangGraph which you will be able to send to production.
6 |
7 | This module is just the beginning of your journey with LangGraph. For more advanced topics, we recommend:
8 |
9 | - Exploring the [official LangGraph documentation](https://github.com/langchain-ai/langgraph)
10 | - Taking the comprehensive [Introduction to LangGraph](https://academy.langchain.com/courses/intro-to-langgraph) course from LangChain Academy
11 | - Build something yourself !
12 |
13 | In the next Unit, you'll now explore real use cases. It's time to leave theory to get into real action !
14 |
15 | We would greatly appreciate **your thoughts on the course and suggestions for improvement**. If you have feedback, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
16 |
17 | ### Keep Learning, Stay Awesome! 🤗
18 |
19 | Good Sir/Madam! 🎩🦇
20 |
21 | -Alfred-
--------------------------------------------------------------------------------
/units/en/unit2/langgraph/document_analysis_agent.mdx:
--------------------------------------------------------------------------------
1 | # Document Analysis Graph
2 |
3 | Alfred at your service. As Mr. Wayne's trusted butler, I've taken the liberty of documenting how I assist Mr Wayne with his various documentary needs. While he's out attending to his... nighttime activities, I ensure all his paperwork, training schedules, and nutritional plans are properly analyzed and organized.
4 |
5 | Before leaving, he left a note with his week training program. I then took the responsibility to come up with a **menu** for tomorrow's meals.
6 |
7 | For future such event, let's create a document analysis system using LangGraph to serve Mister Wayne's needs. This system can:
8 |
9 | 1. Process images document
10 | 2. Extract text using vision models (Vision Language Model)
11 | 3. Perform calculations when needed (to demonstrate normal tools)
12 | 4. Analyze content and provide concise summaries
13 | 5. Execute specific instructions related to documents
14 |
15 | ## The Butler's Workflow
16 |
17 | The workflow we'll build, follows a structured this schema:
18 |
19 | ![Butler's Document Analysis Workflow](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/alfred_flow.png)
20 |
21 | <Tip>
22 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/langgraph/agent.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
23 | </Tip>
24 |
25 | ## Setting Up the environment
26 |
27 | ```python
28 | %pip install langgraph langchain_openai langchain_core
29 | ```
30 | and imports :
31 | ```python
32 | import base64
33 | from typing import List, TypedDict, Annotated, Optional
34 | from langchain_openai import ChatOpenAI
35 | from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage
36 | from langgraph.graph.message import add_messages
37 | from langgraph.graph import START, StateGraph
38 | from langgraph.prebuilt import ToolNode, tools_condition
39 | from IPython.display import Image, display
40 | ```
41 |
42 | ## Defining Agent's State
43 |
44 | This state is a little more complex than the previous ones we have seen.
45 | AnyMessage is a class from langchain that define messages and the add_messages is an operator that add the latest message rather than overwritting it with the latest state.
46 |
47 | This is a new concept in LangGraph, where you can add operators in your state to define the way they should interact together.
48 |
49 | ```python
50 | class AgentState(TypedDict):
51 | # The document provided
52 | input_file: Optional[str] # Contains file path (PDF/PNG)
53 | messages: Annotated[list[AnyMessage], add_messages]
54 | ```
55 |
56 | ## Preparing Tools
57 |
58 | ```python
59 | vision_llm = ChatOpenAI(model="gpt-4o")
60 |
61 | def extract_text(img_path: str) -> str:
62 | """
63 | Extract text from an image file using a multimodal model.
64 |
65 | Master Wayne often leaves notes with his training regimen or meal plans.
66 | This allows me to properly analyze the contents.
67 | """
68 | all_text = ""
69 | try:
70 | # Read image and encode as base64
71 | with open(img_path, "rb") as image_file:
72 | image_bytes = image_file.read()
73 |
74 | image_base64 = base64.b64encode(image_bytes).decode("utf-8")
75 |
76 | # Prepare the prompt including the base64 image data
77 | message = [
78 | HumanMessage(
79 | content=[
80 | {
81 | "type": "text",
82 | "text": (
83 | "Extract all the text from this image. "
84 | "Return only the extracted text, no explanations."
85 | ),
86 | },
87 | {
88 | "type": "image_url",
89 | "image_url": {
90 | "url": f"data:image/png;base64,{image_base64}"
91 | },
92 | },
93 | ]
94 | )
95 | ]
96 |
97 | # Call the vision-capable model
98 | response = vision_llm.invoke(message)
99 |
100 | # Append extracted text
101 | all_text += response.content + "\n\n"
102 |
103 | return all_text.strip()
104 | except Exception as e:
105 | # A butler should handle errors gracefully
106 | error_msg = f"Error extracting text: {str(e)}"
107 | print(error_msg)
108 | return ""
109 |
110 | def divide(a: int, b: int) -> float:
111 | """Divide a and b - for Master Wayne's occasional calculations."""
112 | return a / b
113 |
114 | # Equip the butler with tools
115 | tools = [
116 | divide,
117 | extract_text
118 | ]
119 |
120 | llm = ChatOpenAI(model="gpt-4o")
121 | llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=False)
122 | ```
123 |
124 | ## The nodes
125 |
126 | ```python
127 | def assistant(state: AgentState):
128 | # System message
129 | textual_description_of_tool="""
130 | extract_text(img_path: str) -> str:
131 | Extract text from an image file using a multimodal model.
132 |
133 | Args:
134 | img_path: A local image file path (strings).
135 |
136 | Returns:
137 | A single string containing the concatenated text extracted from each image.
138 | divide(a: int, b: int) -> float:
139 | Divide a and b
140 | """
141 | image=state["input_file"]
142 | sys_msg = SystemMessage(content=f"You are an helpful butler named Alfred that serves Mr. Wayne and Batman. You can analyse documents and run computations with provided tools:\n{textual_description_of_tool} \n You have access to some optional images. Currently the loaded image is: {image}")
143 |
144 | return {
145 | "messages": [llm_with_tools.invoke([sys_msg] + state["messages"])],
146 | "input_file": state["input_file"]
147 | }
148 | ```
149 |
150 | ## The ReAct Pattern: How I Assist Mr. Wayne
151 |
152 | Allow me to explain the approach in this agent. The agent follows what's known as the ReAct pattern (Reason-Act-Observe)
153 |
154 | 1. **Reason** about his documents and requests
155 | 2. **Act** by using appropriate tools
156 | 3. **Observe** the results
157 | 4. **Repeat** as necessary until I've fully addressed his needs
158 |
159 | This is a simple implementation of an agent using langGraph.
160 |
161 | ```python
162 | # The graph
163 | builder = StateGraph(AgentState)
164 |
165 | # Define nodes: these do the work
166 | builder.add_node("assistant", assistant)
167 | builder.add_node("tools", ToolNode(tools))
168 |
169 | # Define edges: these determine how the control flow moves
170 | builder.add_edge(START, "assistant")
171 | builder.add_conditional_edges(
172 | "assistant",
173 | # If the latest message requires a tool, route to tools
174 | # Otherwise, provide a direct response
175 | tools_condition,
176 | )
177 | builder.add_edge("tools", "assistant")
178 | react_graph = builder.compile()
179 |
180 | # Show the butler's thought process
181 | display(Image(react_graph.get_graph(xray=True).draw_mermaid_png()))
182 | ```
183 |
184 | We define a `tools` node with our list of tools. The `assistant` node is just our model with bound tools.
185 | We create a graph with `assistant` and `tools` nodes.
186 |
187 | We add `tools_condition` edge, which routes to `End` or to `tools` based on whether the `assistant` calls a tool.
188 |
189 | Now, we add one new step:
190 |
191 | We connect the `tools` node back to the `assistant`, forming a loop.
192 |
193 | - After the `assistant` node executes, `tools_condition` checks if the model's output is a tool call.
194 | - If it is a tool call, the flow is directed to the `tools` node.
195 | - The `tools` node connects back to `assistant`.
196 | - This loop continues as long as the model decides to call tools.
197 | - If the model response is not a tool call, the flow is directed to END, terminating the process.
198 |
199 | ![ReAct Pattern](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Agent.png)
200 |
201 | ## The Butler in Action
202 |
203 | ### Example 1: Simple Calculations
204 |
205 | Here is an example to show a simple use case of an agent using a tool in LangGraph.
206 |
207 | ```python
208 | messages = [HumanMessage(content="Divide 6790 by 5")]
209 | messages = react_graph.invoke({"messages": messages, "input_file": None})
210 |
211 | # Show the messages
212 | for m in messages['messages']:
213 | m.pretty_print()
214 | ```
215 |
216 | The conversation would proceed:
217 |
218 | ```
219 | Human: Divide 6790 by 5
220 |
221 | AI Tool Call: divide(a=6790, b=5)
222 |
223 | Tool Response: 1358.0
224 |
225 | Alfred: The result of dividing 6790 by 5 is 1358.0.
226 | ```
227 |
228 | ### Example 2: Analyzing Master Wayne's Training Documents
229 |
230 | When Master Wayne leaves his training and meal notes:
231 |
232 | ```python
233 | messages = [HumanMessage(content="According to the note provided by Mr. Wayne in the provided images. What's the list of items I should buy for the dinner menu?")]
234 | messages = react_graph.invoke({"messages": messages, "input_file": "Batman_training_and_meals.png"})
235 | ```
236 |
237 | The interaction would proceed:
238 |
239 | ```
240 | Human: According to the note provided by Mr. Wayne in the provided images. What's the list of items I should buy for the dinner menu?
241 |
242 | AI Tool Call: extract_text(img_path="Batman_training_and_meals.png")
243 |
244 | Tool Response: [Extracted text with training schedule and menu details]
245 |
246 | Alfred: For the dinner menu, you should buy the following items:
247 |
248 | 1. Grass-fed local sirloin steak
249 | 2. Organic spinach
250 | 3. Piquillo peppers
251 | 4. Potatoes (for oven-baked golden herb potato)
252 | 5. Fish oil (2 grams)
253 |
254 | Ensure the steak is grass-fed and the spinach and peppers are organic for the best quality meal.
255 | ```
256 |
257 | ## Key Takeaways
258 |
259 | Should you wish to create your own document analysis butler, here are key considerations:
260 |
261 | 1. **Define clear tools** for specific document-related tasks
262 | 2. **Create a robust state tracker** to maintain context between tool calls
263 | 3. **Consider error handling** for tools fails
264 | 5. **Maintain contextual awareness** of previous interactions (ensured by the operator add_messages)
265 |
266 | With these principles, you too can provide exemplary document analysis service worthy of Wayne Manor.
267 |
268 | *I trust this explanation has been satisfactory. Now, if you'll excuse me, Master Wayne's cape requires pressing before tonight's activities.*
269 |
--------------------------------------------------------------------------------
/units/en/unit2/langgraph/introduction.mdx:
--------------------------------------------------------------------------------
1 | # Introduction to `LangGraph`
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/LangGraph.png" alt="Unit 2.3 Thumbnail"/>
4 |
5 | Welcome to this next part of our journey, where you'll learn **how to build applications** using the [`LangGraph`](https://github.com/langchain-ai/langgraph) framework designed to help you structure and orchestrate complex LLM workflows.
6 |
7 | `LangGraph` is a framework that allows you to build **production-ready** applications by giving you **control** tools over the flow of your agent.
8 |
9 | ## Module Overview
10 |
11 | In this unit, you'll discover:
12 |
13 | ### 1️⃣ [What is LangGraph, and when to use it?](./when_to_use_langgraph)
14 | ### 2️⃣ [Building Blocks of LangGraph](./building_blocks)
15 | ### 3️⃣ [Alfred, the mail sorting butler](./first_graph)
16 | ### 4️⃣ [Alfred, the document Analyst agent](./document_analysis_agent)
17 | ### 5️⃣ [Quiz](./quizz1)
18 |
19 | <Tip warning={true}>
20 | The examples in this section require access to a powerful LLM/VLM model. We ran them using the GPT-4o API because it has the best compatibility with langGraph.
21 | </Tip>
22 |
23 | By the end of this unit, you'll be equipped to build robust, organized and production ready applications !
24 |
25 | That being said, this section is an introduction to langGraph and more advances topics can be discovered in the free langChain academy course : [Introduction to LangGraph](https://academy.langchain.com/courses/intro-to-langgraph)
26 |
27 | Let's get started!
28 |
29 | ## Resources
30 |
31 | - [LangGraph Agents](https://langchain-ai.github.io/langgraph/) - Examples of LangGraph agent
32 | - [LangChain academy](https://academy.langchain.com/courses/intro-to-langgraph) - Full course on LangGraph from LangChain
--------------------------------------------------------------------------------
/units/en/unit2/langgraph/quiz1.mdx:
--------------------------------------------------------------------------------
1 | # Test Your Understanding of LangGraph
2 |
3 | Let's test your understanding of `LangGraph` with a quick quiz! This will help reinforce the key concepts we've covered so far.
4 |
5 | This is an optional quiz and it's not graded.
6 |
7 | ### Q1: What is the primary purpose of LangGraph?
8 | Which statement best describes what LangGraph is designed for?
9 |
10 | <Question
11 | choices={[
12 | {
13 | text: "A framework to build control flows for applications containing LLMs",
14 | explain: "Correct! LangGraph is specifically designed to help build and manage the control flow of applications that use LLMs.",
15 | correct: true
16 | },
17 | {
18 | text: "A library that provides interfaces to interact with different LLM models",
19 | explain: "This better describes LangChain's role, which provides standard interfaces for model interaction. LangGraph focuses on control flow.",
20 | },
21 | {
22 | text: "An Agent library for tool calling",
23 | explain: "While LangGraph works with agents, the main purpose of langGraph is 'Ochestration'.",
24 | }
25 | ]}
26 | />
27 |
28 | ---
29 |
30 | ### Q2: In the context of the "Control vs Freedom" trade-off, where does LangGraph stand?
31 | Which statement best characterizes LangGraph's approach to agent design?
32 |
33 | <Question
34 | choices={[
35 | {
36 | text: "LangGraph maximizes freedom, allowing LLMs to make all decisions independently",
37 | explain: "LangGraph actually focuses more on control than freedom, providing structure for LLM workflows.",
38 | },
39 | {
40 | text: "LangGraph provides strong control over execution flow while still leveraging LLM capabilities for decision making",
41 | explain: "Correct! LangGraph shines when you need control over your agent's execution, providing predictable behavior through structured workflows.",
42 | correct: true
43 | },
44 | ]}
45 | />
46 |
47 | ---
48 |
49 | ### Q3: What role does State play in LangGraph?
50 | Choose the most accurate description of State in LangGraph.
51 |
52 | <Question
53 | choices={[
54 | {
55 | text: "State is the latest generation from the LLM",
56 | explain: "State is a user-defined class in LangGraph, not LLM generated. It's fields are user defined, the values can be LLM filled",
57 | },
58 | {
59 | text: "State is only used to track errors during execution",
60 | explain: "State has a much broader purpose than just error tracking. But that's still usefull.",
61 | },
62 | {
63 | text: "State represents the information that flows through your agent application",
64 | explain: "Correct! State is central to LangGraph and contains all the information needed for decision-making between steps. You provide the fields than you need to compute and the nodes can alter the values to decide on a branching.",
65 | correct: true
66 | },
67 | {
68 | text: "State is only relevant when working with external APIs",
69 | explain: "State is fundamental to all LangGraph applications, not just those working with external APIs.",
70 | }
71 | ]}
72 | />
73 |
74 | ### Q4: What is a Conditional Edge in LangGraph?
75 | Select the most accurate description.
76 |
77 | <Question
78 | choices={[
79 | {
80 | text: "An edge that determines which node to execute next based on evaluating a condition",
81 | explain: "Correct! Conditional edges allow your graph to make dynamic routing decisions based on the current state, creating branching logic in your workflow.",
82 | correct: true
83 | },
84 | {
85 | text: "An edge that is only followed when a specific condition occurs",
86 | explain: "Conditional edges control the flow of the application on it's outputs, not on the input.",
87 | },
88 | {
89 | text: "An edge that requires user confirmation before proceeding",
90 | explain: "Conditional edges are based on programmatic conditions, not user interaction requirements.",
91 | }
92 | ]}
93 | />
94 |
95 | ---
96 |
97 | ### Q5: How does LangGraph help address the hallucination problem in LLMs?
98 | Choose the best answer.
99 |
100 | <Question
101 | choices={[
102 | {
103 | text: "LangGraph eliminates hallucinations entirely by limiting LLM responses",
104 | explain: "No framework can completely eliminate hallucinations from LLMs, LangGraph is no exception.",
105 | },
106 | {
107 | text: "LangGraph provides structured workflows that can validate and verify LLM outputs",
108 | explain: "Correct! By creating structured workflows with validation steps, verification nodes, and error handling paths, LangGraph helps reduce the impact of hallucinations.",
109 | correct: true
110 | },
111 | {
112 | text: "LangGraph has no effect on hallucinations",
113 | explain: "LangGraph's structured approach to workflows can help significantly in mitigating hallucinations at the cost of speed.",
114 | }
115 | ]}
116 | />
117 |
118 | Congratulations on completing the quiz! 🎉 If you missed any questions, consider reviewing the previous sections to strengthen your understanding. Next, we'll explore more advanced features of LangGraph and see how to build more complex agent workflows.
119 |
--------------------------------------------------------------------------------
/units/en/unit2/langgraph/when_to_use_langgraph.mdx:
--------------------------------------------------------------------------------
1 | # What is `LangGraph`?
2 |
3 | `LangGraph` is a framework developed by [LangChain](https://www.langchain.com/) **to manage the control flow of applications that integrate an LLM**.
4 |
5 | ## Is `LangGraph` different from `LangChain`?
6 |
7 | LangChain provides a standard interface to interact with models and other components, useful for retrieval, LLM calls and tools calls.
8 | The classes from LangChain might be used in LangGraph, but do not HAVE to be used.
9 |
10 | The packages are different and can be used in isolation, but, in the end, all resources you will find online use both packages hand in hand.
11 |
12 | ## When should I use `LangGraph`?
13 | ### Control vs freedom
14 |
15 | When designing AI applications, you face a fundamental trade-off between **control** and **freedom**:
16 |
17 | - **Freedom** gives your LLM more room to be creative and tackle unexpected problems.
18 | - **Control** allows you to ensure predictable behavior and maintain guardrails.
19 |
20 | Code Agents, like the ones you can encounter in *smolagents*, are very free. They can call multiple tools in a single action step, create their own tools, etc. However, this behavior can make them less predictable and less controllable than a regular Agent working with JSON!
21 |
22 | `LangGraph` is on the other end of the spectrum, it shines when you need **"Control"** on the execution of your agent.
23 |
24 | LangGraph is particularly valuable when you need **Control over your applications**. It gives you the tools to build an application that follows a predictable process while still leveraging the power of LLMs.
25 |
26 | Put simply, if your application involves a series of steps that need to be orchestrated in a specific way, with decisions being made at each junction point, **LangGraph provides the structure you need**.
27 |
28 | As an example, let's say we want to build an LLM assistant that can answer some questions over some documents.
29 |
30 | Since LLMs understand text the best, before being able to answer the question, you will need to convert other complex modalities (charts, tables) into text. However, that choice depends on the type of document you have!
31 |
32 | This is a branching that I chose to represent as follow :
33 |
34 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/flow.png" alt="Control flow"/>
35 |
36 | > 💡 **Tip:** The left part is not an agent, as here no tool call is involved. but the right part will need to write some code to query the xls ( convert to pandas and manipulate it ).
37 |
38 | While this branching is deterministic, you can also design branching that are conditioned on the output of an LLM making them undeterministic.
39 |
40 | The key scenarios where LangGraph excels include:
41 |
42 | - **Multi-step reasoning processes** that need explicit control on the flow
43 | - **Applications requiring persistence of state** between steps
44 | - **Systems that combine deterministic logic with AI capabilities**
45 | - **Workflows that need human-in-the-loop interventions**
46 | - **Complex agent architectures** with multiple components working together
47 |
48 | In essence, whenever possible, **as a human**, design a flow of actions based on the output of each action, and decide what to execute next accordingly. In this case, LangGraph is the correct framework for you!
49 |
50 | `LangGraph` is, in my opinion, the most production-ready agent framework on the market.
51 |
52 | ## How does LangGraph work?
53 |
54 | At its core, `LangGraph` uses a directed graph structure to define the flow of your application:
55 |
56 | - **Nodes** represent individual processing steps (like calling an LLM, using a tool, or making a decision).
57 | - **Edges** define the possible transitions between steps.
58 | - **State** is user defined and maintained and passed between nodes during execution. When deciding which node to target next, this is the current state that we look at.
59 |
60 | We will explore those fundamental blocks more in the next chapter!
61 |
62 | ## How is it different from regular python? Why do I need LangGraph?
63 |
64 | You might wonder: "I could just write regular Python code with if-else statements to handle all these flows, right?"
65 |
66 | While technically true, LangGraph offers **some advantages** over vanilla Python for building complex systems. You could build the same application without LangGraph, but it builds easier tools and abstractions for you.
67 |
68 | It includes states, visualization, logging (traces), built-in human-in-the-loop, and more.
69 |
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/README.md:
--------------------------------------------------------------------------------
1 | # Table of Contents
2 |
3 | This LlamaIndex frame outline is part of unit 2 of the course. You can access the unit 2 about LlamaIndex on hf.co/learn 👉 <a href="https://hf.co/learn/agents-course/unit2/llama-index/introduction">here</a>
4 |
5 | | Title | Description |
6 | | --- | --- |
7 | | [Introduction](introduction.mdx) | Introduction to LlamaIndex |
8 | | [LlamaHub](llama-hub.mdx) | LlamaHub: a registry of integrations, agents and tools |
9 | | [Components](components.mdx) | Components: the building blocks of workflows |
10 | | [Tools](tools.mdx) | Tools: how to build tools in LlamaIndex |
11 | | [Quiz 1](quiz1.mdx) | Quiz 1 |
12 | | [Agents](agents.mdx) | Agents: how to build agents in LlamaIndex |
13 | | [Workflows](workflows.mdx) | Workflows: a sequence of steps, events made of components that are executed in order |
14 | | [Quiz 2](quiz2.mdx) | Quiz 2 |
15 | | [Conclusion](conclusion.mdx) | Conclusion |
16 |
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/agents.mdx:
--------------------------------------------------------------------------------
1 | # Using Agents in LlamaIndex
2 |
3 | Remember Alfred, our helpful butler agent from earlier? Well, he's about to get an upgrade!
4 | Now that we understand the tools available in LlamaIndex, we can give Alfred new capabilities to serve us better.
5 |
6 | But before we continue, let's remind ourselves what makes an agent like Alfred tick.
7 | Back in Unit 1, we learned that:
8 |
9 | > An Agent is a system that leverages an AI model to interact with its environment to achieve a user-defined objective. It combines reasoning, planning, and action execution (often via external tools) to fulfil tasks.
10 |
11 | LlamaIndex supports **three main types of reasoning agents:**
12 |
13 | ![Agents](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/agents.png)
14 |
15 | 1. `Function Calling Agents` - These work with AI models that can call specific functions.
16 | 2. `ReAct Agents` - These can work with any AI that does chat or text endpoint and deal with complex reasoning tasks.
17 | 3. `Advanced Custom Agents` - These use more complex methods to deal with more complex tasks and workflows.
18 |
19 | <Tip>Find more information on advanced agents on <a href="https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/agent/workflow/base_agent.py">BaseWorkflowAgent</a></Tip>
20 |
21 | ## Initialising Agents
22 |
23 | <Tip>
24 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
25 | </Tip>
26 |
27 | To create an agent, we start by providing it with a **set of functions/tools that define its capabilities**.
28 | Let's look at how to create an agent with some basic tools. As of this writing, the agent will automatically use the function calling API (if available), or a standard ReAct agent loop.
29 |
30 | LLMs that support a tools/functions API are relatively new, but they provide a powerful way to call tools by avoiding specific prompting and allowing the LLM to create tool calls based on provided schemas.
31 |
32 | ReAct agents are also good at complex reasoning tasks and can work with any LLM that has chat or text completion capabilities. They are more verbose, and show the reasoning behind certain actions that they take.
33 |
34 | ```python
35 | from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
36 | from llama_index.core.agent.workflow import AgentWorkflow
37 | from llama_index.core.tools import FunctionTool
38 |
39 | # define sample Tool -- type annotations, function names, and docstrings, are all included in parsed schemas!
40 | def multiply(a: int, b: int) -> int:
41 | """Multiplies two integers and returns the resulting integer"""
42 | return a * b
43 |
44 | # initialize llm
45 | llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
46 |
47 | # initialize agent
48 | agent = AgentWorkflow.from_tools_or_functions(
49 | [FunctionTool.from_defaults(multiply)],
50 | llm=llm
51 | )
52 | ```
53 |
54 | **Agents are stateless by default**, add remembering past interactions is opt-in using a `Context` object
55 | This might be useful if you want to use an agent that needs to remember previous interactions, like a chatbot that maintains context across multiple messages or a task manager that needs to track progress over time.
56 |
57 | ```python
58 | # stateless
59 | response = await agent.run("What is 2 times 2?")
60 |
61 | # remembering state
62 | from llama_index.core.workflow import Context
63 |
64 | ctx = Context(agent)
65 |
66 | response = await agent.run("My name is Bob.", ctx=ctx)
67 | response = await agent.run("What was my name again?", ctx=ctx)
68 | ```
69 |
70 | You'll notice that agents in `LlamaIndex` are async because they use Python's `await` operator. If you are new to async code in Python, or need a refresher, they have an [excellent async guide](https://docs.llamaindex.ai/en/stable/getting_started/async_python/).
71 |
72 | Now we've gotten the basics, let's take a look at how we can use more complex tools in our agents.
73 |
74 | ## Creating RAG Agents with QueryEngineTools
75 |
76 | **Agentic RAG is a powerful way to use agents to answer questions about your data.** We can pass various tools to Alfred to help him answer questions.
77 | However, instead of answering the question on top of documents automatically, Alfred can decide to use any other tool or flow to answer the question.
78 |
79 | ![Agentic RAG](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/agentic-rag.png)
80 |
81 | It is easy to **wrap `QueryEngine` as a tool** for an agent.
82 | When doing so, we need to **define a name and description**. The LLM will use this information to correctly use the tool.
83 | Let's see how to load in a `QueryEngineTool` using the `QueryEngine` we created in the [component section](components).
84 |
85 | ```python
86 | from llama_index.core.tools import QueryEngineTool
87 |
88 | query_engine = index.as_query_engine(llm=llm, similarity_top_k=3) # as shown in the Components in LlamaIndex section
89 |
90 | query_engine_tool = QueryEngineTool.from_defaults(
91 | query_engine=query_engine,
92 | name="name",
93 | description="a specific description",
94 | return_direct=False,
95 | )
96 | query_engine_agent = AgentWorkflow.from_tools_or_functions(
97 | [query_engine_tool],
98 | llm=llm,
99 | system_prompt="You are a helpful assistant that has access to a database containing persona descriptions. "
100 | )
101 | ```
102 |
103 | ## Creating Multi-agent systems
104 |
105 | The `AgentWorkflow` class also directly supports multi-agent systems. By giving each agent a name and description, the system maintains a single active speaker, with each agent having the ability to hand off to another agent.
106 |
107 | By narrowing the scope of each agent, we can help increase their general accuracy when responding to user messages.
108 |
109 | **Agents in LlamaIndex can also directly be used as tools** for other agents, for more complex and custom scenarios.
110 |
111 | ```python
112 | from llama_index.core.agent.workflow import (
113 | AgentWorkflow,
114 | FunctionAgent,
115 | ReActAgent,
116 | )
117 |
118 | # Define some tools
119 | def add(a: int, b: int) -> int:
120 | """Add two numbers."""
121 | return a + b
122 |
123 |
124 | def subtract(a: int, b: int) -> int:
125 | """Subtract two numbers."""
126 | return a - b
127 |
128 |
129 | # Create agent configs
130 | # NOTE: we can use FunctionAgent or ReActAgent here.
131 | # FunctionAgent works for LLMs with a function calling API.
132 | # ReActAgent works for any LLM.
133 | calculator_agent = ReActAgent(
134 | name="calculator",
135 | description="Performs basic arithmetic operations",
136 | system_prompt="You are a calculator assistant. Use your tools for any math operation.",
137 | tools=[add, subtract],
138 | llm=llm,
139 | )
140 |
141 | query_agent = ReActAgent(
142 | name="info_lookup",
143 | description="Looks up information about XYZ",
144 | system_prompt="Use your tool to query a RAG system to answer information about XYZ",
145 | tools=[query_engine_tool],
146 | llm=llm
147 | )
148 |
149 | # Create and run the workflow
150 | agent = AgentWorkflow(
151 | agents=[calculator_agent, query_agent], root_agent="calculator"
152 | )
153 |
154 | # Run the system
155 | response = await agent.run(user_msg="Can you add 5 and 3?")
156 | ```
157 |
158 | <Tip>Haven't learned enough yet? There is a lot more to discover about agents and tools in LlamaIndex within the <a href="https://docs.llamaindex.ai/en/stable/examples/agent/agent_workflow_basic/">AgentWorkflow Basic Introduction</a> or the <a href="https://docs.llamaindex.ai/en/stable/understanding/agent/">Agent Learning Guide</a>, where you can read more about streaming, context serialization, and human-in-the-loop!</Tip>
159 |
160 | Now that we understand the basics of agents and tools in LlamaIndex, let's see how we can use LlamaIndex to **create configurable and manageable workflows!**
161 |
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/conclusion.mdx:
--------------------------------------------------------------------------------
1 | # Conclusion
2 |
3 | Congratulations on finishing the `llama-index` module of this second Unit 🥳
4 |
5 | You’ve just mastered the fundamentals of `llama-index` and you’ve seen how to build your own agentic workflows!
6 | Now that you have skills in `llama-index`, you can start to create search engines that will solve tasks you're interested in.
7 |
8 | In the next module of the unit, you're going to learn **how to build Agents with LangGraph**.
9 |
10 | Finally, we would love **to hear what you think of the course and how we can improve it**.
11 | If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
12 |
13 | ### Keep Learning, and stay awesome 🤗
14 |
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/introduction.mdx:
--------------------------------------------------------------------------------
1 | # Introduction to LlamaIndex
2 |
3 | Welcome to this module, where you’ll learn how to build LLM-powered agents using the [LlamaIndex](https://www.llamaindex.ai/) toolkit.
4 |
5 | LlamaIndex is **a complete toolkit for creating LLM-powered agents over your data using indexes and workflows**. For this course we'll focus on three main parts that help build agents in LlamaIndex: **Components**, **Agents and Tools** and **Workflows**.
6 |
7 | ![LlamaIndex](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/thumbnail.png)
8 |
9 | Let's look at these key parts of LlamaIndex and how they help with agents:
10 |
11 | - **Components**: Are the basic building blocks you use in LlamaIndex. These include things like prompts, models, and databases. Components often help connect LlamaIndex with other tools and libraries.
12 | - **Tools**: Tools are components that provide specific capabilities like searching, calculating, or accessing external services. They are the building blocks that enable agents to perform tasks.
13 | - **Agents**: Agents are autonomous components that can use tools and make decisions. They coordinate tool usage to accomplish complex goals.
14 | - **Workflows**: Are step-by-step processes that process logic together. Workflows or agentic workflows are a way to structure agentic behaviour without the explicit use of agents.
15 |
16 |
17 | ## What Makes LlamaIndex Special?
18 |
19 | While LlamaIndex does some things similar to other frameworks like smolagents, it has some key benefits:
20 |
21 | - **Clear Workflow System**: Workflows help break down how agents should make decisions step by step using an event-driven and async-first syntax. This helps you clearly compose and organize your logic.
22 | - **Advanced Document Parsing with LlamaParse**: LlamaParse was made specifically for LlamaIndex, so the integration is seamless, although it is a paid feature.
23 | - **Many Ready-to-Use Components**: LlamaIndex has been around for a while, so it works with lots of other frameworks. This means it has many tested and reliable components, like LLMs, retrievers, indexes, and more.
24 | - **LlamaHub**: is a registry of hundreds of these components, agents, and tools that you can use within LlamaIndex.
25 |
26 | All of these concepts are required in different scenarios to create useful agents.
27 | In the following sections, we will go over each of these concepts in detail.
28 | After mastering the concepts, we will use our learnings to **create applied use cases with Alfred the agent**!
29 |
30 | Getting our hands on LlamaIndex is exciting, right? So, what are we waiting for? Let's get started with **finding and installing the integrations we need using LlamaHub! 🚀**
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/llama-hub.mdx:
--------------------------------------------------------------------------------
1 | # Introduction to the LlamaHub
2 |
3 | **LlamaHub is a registry of hundreds of integrations, agents and tools that you can use within LlamaIndex.**
4 |
5 | ![LlamaHub](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/llama-hub.png)
6 |
7 | We will be using various integrations in this course, so let's first look at the LlamaHub and how it can help us.
8 |
9 | Let's see how to find and install the dependencies for the components we need.
10 |
11 | ## Installation
12 |
13 | LlamaIndex installation instructions are available as a well-structured **overview on [LlamaHub](https://llamahub.ai/)**.
14 | This might be a bit overwhelming at first, but most of the **installation commands generally follow an easy-to-remember format**:
15 |
16 | ```bash
17 | pip install llama-index-{component-type}-{framework-name}
18 | ```
19 |
20 | Let's try to install the dependencies for an LLM and embedding component using the [Hugging Face inference API integration](https://llamahub.ai/l/llms/llama-index-llms-huggingface-api?from=llms).
21 |
22 | ```bash
23 | pip install llama-index-llms-huggingface-api llama-index-embeddings-huggingface
24 | ```
25 |
26 | ## Usage
27 |
28 | Once installed, we can see the usage patterns. You'll notice that the import paths follow the install command!
29 | Underneath, we can see an example of the usage of **the Hugging Face inference API for an LLM component**.
30 |
31 | ```python
32 | from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
33 | import os
34 | from dotenv import load_dotenv
35 |
36 | # Load the .env file
37 | load_dotenv()
38 |
39 | # Retrieve HF_TOKEN from the environment variables
40 | hf_token = os.getenv("HF_TOKEN")
41 |
42 | llm = HuggingFaceInferenceAPI(
43 | model_name="Qwen/Qwen2.5-Coder-32B-Instruct",
44 | temperature=0.7,
45 | max_tokens=100,
46 | token=hf_token,
47 | )
48 |
49 | response = llm.complete("Hello, how are you?")
50 | print(response)
51 | # I am good, how can I help you today?
52 | ```
53 |
54 | Wonderful, we now know how to find, install and use the integrations for the components we need.
55 | **Let's dive deeper into the components** and see how we can use them to build our own agents.
56 |
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/quiz1.mdx:
--------------------------------------------------------------------------------
1 | # Small Quiz (ungraded) [[quiz1]]
2 |
3 | So far we've discussed the key components and tools used in LlamaIndex.
4 | It's time to make a short quiz, since **testing yourself** is the best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf).
5 | This will help you find **where you need to reinforce your knowledge**.
6 |
7 | This is an optional quiz and it's not graded.
8 |
9 | ### Q1: What is a QueryEngine?
10 | Which of the following best describes a QueryEngine component?
11 |
12 | <Question
13 | choices={[
14 | {
15 | text: "A system that only processes static text without any retrieval capabilities.",
16 | explain: "A QueryEngine must be able to retrieve and process relevant information.",
17 | },
18 | {
19 | text: "A component that finds and retrieves relevant information as part of the RAG process.",
20 | explain: "This captures the core purpose of a QueryEngine component.",
21 | correct: true
22 | },
23 | {
24 | text: "A tool that only stores vector embeddings without search functionality.",
25 | explain: "A QueryEngine does more than just store embeddings - it actively searches and retrieves information.",
26 | },
27 | {
28 | text: "A component that only evaluates response quality.",
29 | explain: "Evaluation is separate from the QueryEngine's main retrieval purpose.",
30 | }
31 | ]}
32 | />
33 |
34 | ---
35 |
36 | ### Q2: What is the Purpose of FunctionTools?
37 | Why are FunctionTools important for an Agent?
38 |
39 | <Question
40 | choices={[
41 | {
42 | text: "To handle large amounts of data storage.",
43 | explain: "FunctionTools are not primarily for data storage.",
44 | },
45 | {
46 | text: "To convert Python functions into tools that an agent can use.",
47 | explain: "FunctionTools wrap Python functions to make them accessible to agents.",
48 | correct: true
49 | },
50 | {
51 | text: "To allow agents to create random functions definitions.",
52 | explain: "FunctionTools serve the specific purpose of making functions available to agents.",
53 | },
54 | {
55 | text: "To only process text data.",
56 | explain: "FunctionTools can work with various types of functions, not just text processing.",
57 | }
58 | ]}
59 | />
60 |
61 | ---
62 |
63 | ### Q3: What are Toolspecs in LlamaIndex?
64 | What is the main purpose of Toolspecs?
65 |
66 | <Question
67 | choices={[
68 | {
69 | text: "They are redundant components that don't add functionality.",
70 | explain: "Toolspecs serve an important purpose in the LlamaIndex ecosystem.",
71 | },
72 | {
73 | text: "They are sets of community-created tools that extend agent capabilities.",
74 | explain: "Toolspecs allow the community to share and reuse tools.",
75 | correct: true
76 | },
77 | {
78 | text: "They are used solely for memory management.",
79 | explain: "Toolspecs are about providing tools, not managing memory.",
80 | },
81 | {
82 | text: "They only work with text processing.",
83 | explain: "Toolspecs can include various types of tools, not just text processing.",
84 | }
85 | ]}
86 | />
87 |
88 | ---
89 |
90 | ### Q4: What is Required to create a tool?
91 | What information must be included when creating a tool?
92 |
93 | <Question
94 | choices={[
95 | {
96 | text: "A function, a name, and description must be defined.",
97 | explain: "While these all make up a tool, the name and description can be parsed from the function and docstring.",
98 | },
99 | {
100 | text: "Only the name is required.",
101 | explain: "A function and description/docstring is also required for proper tool documentation.",
102 | },
103 | {
104 | text: "Only the description is required.",
105 | explain: "A function is required so that we have code to run when an agent selects a tool",
106 | },
107 | {
108 | text: "Only the function is required.",
109 | explain: "The name and description default to the name and docstring from the provided function",
110 | correct: true
111 | }
112 | ]}
113 | />
114 |
115 | ---
116 |
117 | Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce your knowledge. If you pass it, you're ready to dive deeper into building with these components!
118 |
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/quiz2.mdx:
--------------------------------------------------------------------------------
1 | # Quick Self-Check (ungraded) [[quiz2]]
2 |
3 | What?! Another Quiz? We know, we know, ... 😅 But this short, ungraded quiz is here to **help you reinforce key concepts you've just learned**.
4 |
5 | This quiz covers agent workflows and interactions - essential components for building effective AI agents.
6 |
7 | ### Q1: What is the purpose of AgentWorkflow in LlamaIndex?
8 |
9 | <Question
10 | choices={[
11 | {
12 | text: "To run one or more agents with tools",
13 | explain: "Yes, the AgentWorkflow is the main way to quickly create a system with one or more agents.",
14 | correct: true
15 | },
16 | {
17 | text: "To create a single agent that can query your data without memory",
18 | explain: "No, the AgentWorkflow is more capable than that, the QueryEngine is for simple queries over your data.",
19 | },
20 | {
21 | text: "To automatically build tools for agents",
22 | explain: "The AgentWorkflow does not build tools, that is the job of the developer.",
23 | },
24 | {
25 | text: "To manage agent memory and state",
26 | explain: "Managing memory and state is not the primary purpose of AgentWorkflow.",
27 | }
28 | ]}
29 | />
30 |
31 | ---
32 |
33 | ### Q2: What object is used for keeping track of the state of the workflow?
34 |
35 | <Question
36 | choices={[
37 | {
38 | text: "State",
39 | explain: "State is not the correct object for workflow state management.",
40 | },
41 | {
42 | text: "Context",
43 | explain: "Context is the correct object used for keeping track of workflow state.",
44 | correct: true
45 | },
46 | {
47 | text: "WorkflowState",
48 | explain: "WorkflowState is not the correct object.",
49 | },
50 | {
51 | text: "Management",
52 | explain: "Management is not a valid object for workflow state.",
53 | }
54 | ]}
55 | />
56 |
57 | ---
58 |
59 | ### Q3: Which method should be used if you want an agent to remember previous interactions?
60 |
61 | <Question
62 | choices={[
63 | {
64 | text: "run(query_str)",
65 | explain: ".run(query_str) does not maintain conversation history.",
66 | },
67 | {
68 | text: "chat(query_str, ctx=ctx)",
69 | explain: "chat() is not a valid method on workflows.",
70 | },
71 | {
72 | text: "interact(query_str)",
73 | explain: "interact() is not a valid method for agent interactions.",
74 | },
75 | {
76 | text: "run(query_str, ctx=ctx)",
77 | explain: "By passing in and maintaining the context, we can maintain state!",
78 | correct: true
79 | }
80 | ]}
81 | />
82 |
83 | ---
84 |
85 | ### Q4: What is a key feature of Agentic RAG?
86 |
87 | <Question
88 | choices={[
89 | {
90 | text: "It can only use document-based tools, to answer questions in a RAG workflow",
91 | explain: "Agentic RAG can use different tools, including document-based tools.",
92 | },
93 | {
94 | text: "It automatically answers questions without tools, like a chatbot",
95 | explain: "Agentic RAG does use tools to answer questions.",
96 | },
97 | {
98 | text: "It can decide to use any tool to answer questions, including RAG tools",
99 | explain: "Agentic RAG has the flexibility to use different tools to answer questions.",
100 | correct: true
101 | },
102 | {
103 | text: "It only works with Function Calling Agents",
104 | explain: "Agentic RAG is not limited to Function Calling Agents.",
105 | }
106 | ]}
107 | />
108 |
109 | ---
110 |
111 |
112 | Got it? Great! Now let's **do a brief recap of the unit!**
113 |
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/tools.mdx:
--------------------------------------------------------------------------------
1 | # Using Tools in LlamaIndex
2 |
3 | **Defining a clear set of Tools is crucial to performance.** As we discussed in [unit 1](../../unit1/tools), clear tool interfaces are easier for LLMs to use.
4 | Much like a software API interface for human engineers, they can get more out of the tool if it's easy to understand how it works.
5 |
6 | There are **four main types of tools in LlamaIndex**:
7 |
8 | ![Tools](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/tools.png)
9 |
10 | 1. `FunctionTool`: Convert any Python function into a tool that an agent can use. It automatically figures out how the function works.
11 | 2. `QueryEngineTool`: A tool that lets agents use query engines. Since agents are built on query engines, they can also use other agents as tools.
12 | 3. `Toolspecs`: Sets of tools created by the community, which often include tools for specific services like Gmail.
13 | 4. `Utility Tools`: Special tools that help handle large amounts of data from other tools.
14 |
15 | We will go over each of them in more detail below.
16 |
17 | ## Creating a FunctionTool
18 |
19 | <Tip>
20 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/tools.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
21 | </Tip>
22 |
23 | A FunctionTool provides a simple way to wrap any Python function and make it available to an agent.
24 | You can pass either a synchronous or asynchronous function to the tool, along with optional `name` and `description` parameters.
25 | The name and description are particularly important as they help the agent understand when and how to use the tool effectively.
26 | Let's look at how to create a FunctionTool below and then call it.
27 |
28 | ```python
29 | from llama_index.core.tools import FunctionTool
30 |
31 | def get_weather(location: str) -> str:
32 | """Useful for getting the weather for a given location."""
33 | print(f"Getting weather for {location}")
34 | return f"The weather in {location} is sunny"
35 |
36 | tool = FunctionTool.from_defaults(
37 | get_weather,
38 | name="my_weather_tool",
39 | description="Useful for getting the weather for a given location.",
40 | )
41 | tool.call("New York")
42 | ```
43 |
44 | <Tip>When using an agent or LLM with function calling, the tool selected (and the arguments written for that tool) rely strongly on the tool name and description of the purpose and arguments of the tool. Learn more about function calling in the <a href="https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/modules/function_calling.html">Function Calling Guide</a> and <a href="https://docs.llamaindex.ai/en/stable/understanding/agent/function_calling.html">Function Calling Learning Guide</a>.</Tip>
45 |
46 | ## Creating a QueryEngineTool
47 |
48 | The `QueryEngine` we defined in the previous unit can be easily transformed into a tool using the `QueryEngineTool` class.
49 | Let's see how to create a `QueryEngineTool` from a `QueryEngine` in the example below.
50 |
51 | ```python
52 | from llama_index.core import VectorStoreIndex
53 | from llama_index.core.tools import QueryEngineTool
54 | from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
55 | from llama_index.embeddings.huggingface import HuggingFaceEmbedding
56 | from llama_index.vector_stores.chroma import ChromaVectorStore
57 |
58 | embed_model = HuggingFaceEmbedding("BAAI/bge-small-en-v1.5")
59 |
60 | db = chromadb.PersistentClient(path="./alfred_chroma_db")
61 | chroma_collection = db.get_or_create_collection("alfred")
62 | vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
63 |
64 | index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)
65 |
66 | llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
67 | query_engine = index.as_query_engine(llm=llm)
68 | tool = QueryEngineTool.from_defaults(query_engine, name="some useful name", description="some useful description")
69 | ```
70 |
71 | ## Creating Toolspecs
72 |
73 | Think of `ToolSpecs` as collections of tools that work together harmoniously - like a well-organized professional toolkit.
74 | Just as a mechanic's toolkit contains complementary tools that work together for vehicle repairs, a `ToolSpec` combines related tools for specific purposes.
75 | For example, an accounting agent's `ToolSpec` might elegantly integrate spreadsheet capabilities, email functionality, and calculation tools to handle financial tasks with precision and efficiency.
76 |
77 | <details>
78 | <summary>Install the Google Toolspec</summary>
79 | As introduced in the <a href="./llama-hub">section on the LlamaHub</a>, we can install the Google toolspec with the following command:
80 |
81 | ```python
82 | pip install llama-index-tools-google
83 | ```
84 | </details>
85 |
86 | And now we can load the toolspec and convert it to a list of tools.
87 |
88 | ```python
89 | from llama_index.tools.google import GmailToolSpec
90 |
91 | tool_spec = GmailToolSpec()
92 | tool_spec_list = tool_spec.to_tool_list()
93 | ```
94 |
95 | To get a more detailed view of the tools, we can take a look at the `metadata` of each tool.
96 |
97 | ```python
98 | [(tool.metadata.name, tool.metadata.description) for tool in tool_spec_list]
99 | ```
100 |
101 | ### Model Context Protocol (MCP) in LlamaIndex
102 |
103 | LlamaIndex also allows using MCP tools through a [ToolSpec on the LlamaHub](https://llamahub.ai/l/tools/llama-index-tools-mcp?from=).
104 | You can simply run an MCP server and start using it through the following implementation.
105 |
106 | ```python
107 | from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
108 |
109 | # We consider there is a mcp server running on 127.0.0.1:8000, or you can use the mcp client to connect to your own mcp server.
110 | mcp_client = BasicMCPClient("http://127.0.0.1:8000/sse")
111 | mcp_tool = McpToolSpec(client=mcp_client)
112 |
113 | # get the agent
114 | agent = await get_agent(mcp_tool)
115 |
116 | # create the agent context
117 | agent_context = Context(agent)
118 | ```
119 |
120 | ## Utility Tools
121 |
122 | Oftentimes, directly querying an API **can return an excessive amount of data**, some of which may be irrelevant, overflow the context window of the LLM, or unnecessarily increase the number of tokens that you are using.
123 | Let's walk through our two main utility tools below.
124 |
125 | 1. `OnDemandToolLoader`: This tool turns any existing LlamaIndex data loader (BaseReader class) into a tool that an agent can use. The tool can be called with all the parameters needed to trigger `load_data` from the data loader, along with a natural language query string. During execution, we first load data from the data loader, index it (for instance with a vector store), and then query it 'on-demand'. All three of these steps happen in a single tool call.
126 | 2. `LoadAndSearchToolSpec`: The LoadAndSearchToolSpec takes in any existing Tool as input. As a tool spec, it implements `to_tool_list`, and when that function is called, two tools are returned: a loading tool and then a search tool. The load Tool execution would call the underlying Tool, and the index the output (by default with a vector index). The search Tool execution would take in a query string as input and call the underlying index.
127 |
128 | <Tip>You can find toolspecs and utility tools on the <a href="https://llamahub.ai/">LlamaHub</a></Tip>
129 |
130 | Now that we understand the basics of agents and tools in LlamaIndex, let's see how we can **use LlamaIndex to create configurable and manageable workflows!**
--------------------------------------------------------------------------------
/units/en/unit2/llama-index/workflows.mdx:
--------------------------------------------------------------------------------
1 | # Creating agentic workflows in LlamaIndex
2 |
3 | A workflow in LlamaIndex provides a structured way to organize your code into sequential and manageable steps.
4 |
5 | Such a workflow is created by defining `Steps` which are triggered by `Events`, and themselves emit `Events` to trigger further steps.
6 | Let's take a look at Alfred showing a LlamaIndex workflow for a RAG task.
7 |
8 | ![Workflow Schematic](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/workflows.png)
9 |
10 | **Workflows offer several key benefits:**
11 |
12 | - Clear organization of code into discrete steps
13 | - Event-driven architecture for flexible control flow
14 | - Type-safe communication between steps
15 | - Built-in state management
16 | - Support for both simple and complex agent interactions
17 |
18 | As you might have guessed, **workflows strike a great balance between the autonomy of agents while maintaining control over the overall workflow.**
19 |
20 | So, let's learn how to create a workflow ourselves!
21 |
22 | ## Creating Workflows
23 |
24 | <Tip>
25 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/workflows.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
26 | </Tip>
27 |
28 | ### Basic Workflow Creation
29 |
30 | <details>
31 | <summary>Install the Workflow package</summary>
32 | As introduced in the <a href="./llama-hub">section on the LlamaHub</a>, we can install the Workflow package with the following command:
33 |
34 | ```python
35 | pip install llama-index-utils-workflow
36 | ```
37 | </details>
38 |
39 | We can create a single-step workflow by defining a class that inherits from `Workflow` and decorating your functions with `@step`.
40 | We will also need to add `StartEvent` and `StopEvent`, which are special events that are used to indicate the start and end of the workflow.
41 |
42 | ```python
43 | from llama_index.core.workflow import StartEvent, StopEvent, Workflow, step
44 |
45 | class MyWorkflow(Workflow):
46 | @step
47 | async def my_step(self, ev: StartEvent) -> StopEvent:
48 | # do something here
49 | return StopEvent(result="Hello, world!")
50 |
51 |
52 | w = MyWorkflow(timeout=10, verbose=False)
53 | result = await w.run()
54 | ```
55 |
56 | As you can see, we can now run the workflow by calling `w.run()`.
57 |
58 | ### Connecting Multiple Steps
59 |
60 | To connect multiple steps, we **create custom events that carry data between steps.**
61 | To do so, we need to add an `Event` that is passed between the steps and transfers the output of the first step to the second step.
62 |
63 | ```python
64 | from llama_index.core.workflow import Event
65 |
66 | class ProcessingEvent(Event):
67 | intermediate_result: str
68 |
69 | class MultiStepWorkflow(Workflow):
70 | @step
71 | async def step_one(self, ev: StartEvent) -> ProcessingEvent:
72 | # Process initial data
73 | return ProcessingEvent(intermediate_result="Step 1 complete")
74 |
75 | @step
76 | async def step_two(self, ev: ProcessingEvent) -> StopEvent:
77 | # Use the intermediate result
78 | final_result = f"Finished processing: {ev.intermediate_result}"
79 | return StopEvent(result=final_result)
80 |
81 | w = MultiStepWorkflow(timeout=10, verbose=False)
82 | result = await w.run()
83 | result
84 | ```
85 |
86 | The type hinting is important here, as it ensures that the workflow is executed correctly. Let's complicate things a bit more!
87 |
88 | ### Loops and Branches
89 |
90 | The type hinting is the most powerful part of workflows because it allows us to create branches, loops, and joins to facilitate more complex workflows.
91 |
92 | Let's show an example of **creating a loop** by using the union operator `|`.
93 | In the example below, we see that the `LoopEvent` is taken as input for the step and can also be returned as output.
94 |
95 | ```python
96 | from llama_index.core.workflow import Event
97 | import random
98 |
99 |
100 | class ProcessingEvent(Event):
101 | intermediate_result: str
102 |
103 |
104 | class LoopEvent(Event):
105 | loop_output: str
106 |
107 |
108 | class MultiStepWorkflow(Workflow):
109 | @step
110 | async def step_one(self, ev: StartEvent | LoopEvent) -> ProcessingEvent | LoopEvent:
111 | if random.randint(0, 1) == 0:
112 | print("Bad thing happened")
113 | return LoopEvent(loop_output="Back to step one.")
114 | else:
115 | print("Good thing happened")
116 | return ProcessingEvent(intermediate_result="First step complete.")
117 |
118 | @step
119 | async def step_two(self, ev: ProcessingEvent) -> StopEvent:
120 | # Use the intermediate result
121 | final_result = f"Finished processing: {ev.intermediate_result}"
122 | return StopEvent(result=final_result)
123 |
124 |
125 | w = MultiStepWorkflow(verbose=False)
126 | result = await w.run()
127 | result
128 | ```
129 |
130 | ### Drawing Workflows
131 |
132 | We can also draw workflows. Let's use the `draw_all_possible_flows` function to draw the workflow. This stores the workflow in an HTML file.
133 |
134 | ```python
135 | from llama_index.utils.workflow import draw_all_possible_flows
136 |
137 | w = ... # as defined in the previous section
138 | draw_all_possible_flows(w, "flow.html")
139 | ```
140 |
141 | ![workflow drawing](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/workflow-draw.png)
142 |
143 | There is one last cool trick that we will cover in the course, which is the ability to add state to the workflow.
144 |
145 | ### State Management
146 |
147 | State management is useful when you want to keep track of the state of the workflow, so that every step has access to the same state.
148 | We can do this by using the `Context` type hint on top of a parameter in the step function.
149 |
150 | ```python
151 | from llama_index.core.workflow import Context, StartEvent, StopEvent
152 |
153 |
154 | @step
155 | async def query(self, ctx: Context, ev: StartEvent) -> StopEvent:
156 | # store query in the context
157 | await ctx.set("query", "What is the capital of France?")
158 |
159 | # do something with context and event
160 | val = ...
161 |
162 | # retrieve query from the context
163 | query = await ctx.get("query")
164 |
165 | return StopEvent(result=val)
166 | ```
167 |
168 | Great! Now you know how to create basic workflows in LlamaIndex!
169 |
170 | <Tip>There are some more complex nuances to workflows, which you can learn about in <a href="https://docs.llamaindex.ai/en/stable/understanding/workflows/">the LlamaIndex documentation</a>.</Tip>
171 |
172 | However, there is another way to create workflows, which relies on the `AgentWorkflow` class. Let's take a look at how we can use this to create a multi-agent workflow.
173 |
174 | ## Automating workflows with Multi-Agent Workflows
175 |
176 | Instead of manual workflow creation, we can use the **`AgentWorkflow` class to create a multi-agent workflow**.
177 | The `AgentWorkflow` uses Workflow Agents to allow you to create a system of one or more agents that can collaborate and hand off tasks to each other based on their specialized capabilities.
178 | This enables building complex agent systems where different agents handle different aspects of a task.
179 | Instead of importing classes from `llama_index.core.agent`, we will import the agent classes from `llama_index.core.agent.workflow`.
180 | One agent must be designated as the root agent in the `AgentWorkflow` constructor.
181 | When a user message comes in, it is first routed to the root agent.
182 |
183 | Each agent can then:
184 |
185 | - Handle the request directly using their tools
186 | - Handoff to another agent better suited for the task
187 | - Return a response to the user
188 |
189 | Let's see how to create a multi-agent workflow.
190 |
191 | ```python
192 | from llama_index.core.agent.workflow import AgentWorkflow, ReActAgent
193 | from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
194 |
195 | # Define some tools
196 | def add(a: int, b: int) -> int:
197 | """Add two numbers."""
198 | return a + b
199 |
200 | def multiply(a: int, b: int) -> int:
201 | """Multiply two numbers."""
202 | return a * b
203 |
204 | llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
205 |
206 | # we can pass functions directly without FunctionTool -- the fn/docstring are parsed for the name/description
207 | multiply_agent = ReActAgent(
208 | name="multiply_agent",
209 | description="Is able to multiply two integers",
210 | system_prompt="A helpful assistant that can use a tool to multiply numbers.",
211 | tools=[multiply],
212 | llm=llm,
213 | )
214 |
215 | addition_agent = ReActAgent(
216 | name="add_agent",
217 | description="Is able to add two integers",
218 | system_prompt="A helpful assistant that can use a tool to add numbers.",
219 | tools=[add],
220 | llm=llm,
221 | )
222 |
223 | # Create the workflow
224 | workflow = AgentWorkflow(
225 | agents=[multiply_agent, addition_agent],
226 | root_agent="multiply_agent",
227 | )
228 |
229 | # Run the system
230 | response = await workflow.run(user_msg="Can you add 5 and 3?")
231 | ```
232 |
233 | Agent tools can also modify the workflow state we mentioned earlier. Before starting the workflow, we can provide an initial state dict that will be available to all agents.
234 | The state is stored in the state key of the workflow context. It will be injected into the state_prompt which augments each new user message.
235 |
236 | Let's inject a counter to count function calls by modifying the previous example:
237 |
238 | ```python
239 | from llama_index.core.workflow import Context
240 |
241 | # Define some tools
242 | async def add(ctx: Context, a: int, b: int) -> int:
243 | """Add two numbers."""
244 | # update our count
245 | cur_state = await ctx.get("state")
246 | cur_state["num_fn_calls"] += 1
247 | await ctx.set("state", cur_state)
248 |
249 | return a + b
250 |
251 | async def multiply(ctx: Context, a: int, b: int) -> int:
252 | """Multiply two numbers."""
253 | # update our count
254 | cur_state = await ctx.get("state")
255 | cur_state["num_fn_calls"] += 1
256 | await ctx.set("state", cur_state)
257 |
258 | return a * b
259 |
260 | ...
261 |
262 | workflow = AgentWorkflow(
263 | agents=[multiply_agent, addition_agent],
264 | root_agent="multiply_agent"
265 | initial_state={"num_fn_calls": 0},
266 | state_prompt="Current state: {state}. User message: {msg}",
267 | )
268 |
269 | # run the workflow with context
270 | ctx = Context(workflow)
271 | response = await workflow.run(user_msg="Can you add 5 and 3?", ctx=ctx)
272 |
273 | # pull out and inspect the state
274 | state = await ctx.get("state")
275 | print(state["num_fn_calls"])
276 | ```
277 |
278 | Congratulations! You have now mastered the basics of Agents in LlamaIndex! 🎉
279 |
280 | Let's continue with one final quiz to solidify your knowledge! 🚀
281 |
--------------------------------------------------------------------------------
/units/en/unit2/smolagents/conclusion.mdx:
--------------------------------------------------------------------------------
1 | # Conclusion
2 |
3 | Congratulations on finishing the `smolagents` module of this second Unit 🥳
4 |
5 | You’ve just mastered the fundamentals of `smolagents` and you’ve built your own Agent! Now that you have skills in `smolagents`, you can now start to create Agents that will solve tasks you're interested about.
6 |
7 | In the next module, you're going to learn **how to build Agents with LlamaIndex**.
8 |
9 | Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
10 |
11 | ### Keep Learning, stay awesome 🤗
12 |
--------------------------------------------------------------------------------
/units/en/unit2/smolagents/final_quiz.mdx:
--------------------------------------------------------------------------------
1 | # Exam Time!
2 |
3 | Well done on working through the material on `smolagents`! You've already achieved a lot. Now, it's time to put your knowledge to the test with a quiz. 🧠
4 |
5 | ## Instructions
6 |
7 | - The quiz consists of code questions.
8 | - You will be given instructions to complete the code snippets.
9 | - Read the instructions carefully and complete the code snippets accordingly.
10 | - For each question, you will be given the result and some feedback.
11 |
12 | 🧘 **This quiz is ungraded and uncertified**. It's about you understanding the `smolagents` library and knowing whether you should spend more time on the written material. In the coming units you'll put this knowledge to the test in use cases and projects.
13 |
14 | Let's get started!
15 |
16 | ## Quiz 🚀
17 |
18 | <iframe
19 | src="https://agents-course-unit2-smolagents-quiz.hf.space"
20 | frameborder="0"
21 | width="850"
22 | height="450"
23 | ></iframe>
24 |
25 | You can also access the quiz 👉 [here](https://huggingface.co/spaces/agents-course/unit2_smolagents_quiz)
--------------------------------------------------------------------------------
/units/en/unit2/smolagents/introduction.mdx:
--------------------------------------------------------------------------------
1 | # Introduction to `smolagents`
2 |
3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/thumbnail.jpg" alt="Unit 2.1 Thumbnail"/>
4 |
5 | Welcome to this module, where you'll learn **how to build effective agents** using the [`smolagents`](https://github.com/huggingface/smolagents) library, which provides a lightweight framework for creating capable AI agents.
6 |
7 | `smolagents` is a Hugging Face library; therefore, we would appreciate your support by **starring** the smolagents [`repository`](https://github.com/huggingface/smolagents) :
8 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/star_smolagents.gif" alt="staring smolagents"/>
9 |
10 | ## Module Overview
11 |
12 | This module provides a comprehensive overview of key concepts and practical strategies for building intelligent agents using `smolagents`.
13 |
14 | With so many open-source frameworks available, it's essential to understand the components and capabilities that make `smolagents` a useful option or to determine when another solution might be a better fit.
15 |
16 | We'll explore critical agent types, including code agents designed for software development tasks, tool calling agents for creating modular, function-driven workflows, and retrieval agents that access and synthesize information.
17 |
18 | Additionally, we'll cover the orchestration of multiple agents as well as the integration of vision capabilities and web browsing, which unlock new possibilities for dynamic and context-aware applications.
19 |
20 | In this unit, Alfred, the agent from Unit 1, makes his return. This time, he’s using the `smolagents` framework for his internal workings. Together, we’ll explore the key concepts behind this framework as Alfred tackles various tasks. Alfred is organizing a party at the Wayne Manor while the Wayne family 🦇 is away, and he has plenty to do. Join us as we showcase his journey and how he handles these tasks with `smolagents`!
21 |
22 | <Tip>
23 |
24 | In this unit, you will learn to build AI agents with the `smolagents` library. Your agents will be able to search for data, execute code, and interact with web pages. You will also learn how to combine multiple agents to create more powerful systems.
25 |
26 | </Tip>
27 |
28 | ![Alfred the agent](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/this-is-alfred.jpg)
29 |
30 | ## Contents
31 |
32 | During this unit on `smolagents`, we cover:
33 |
34 | ### 1️⃣ [Why Use smolagents](./why_use_smolagents)
35 |
36 | `smolagents` is one of the many open-source agent frameworks available for application development. Alternative options include `LlamaIndex` and `LangGraph`, which are also covered in other modules in this course. `smolagents` offers several key features that might make it a great fit for specific use cases, but we should always consider all options when selecting a framework. We'll explore the advantages and drawbacks of using `smolagents`, helping you make an informed decision based on your project's requirements.
37 |
38 | ### 2️⃣ [CodeAgents](./code_agents)
39 |
40 | `CodeAgents` are the primary type of agent in `smolagents`. Instead of generating JSON or text, these agents produce Python code to perform actions. This module explores their purpose, functionality, and how they work, along with hands-on examples to showcase their capabilities.
41 |
42 | ### 3️⃣ [ToolCallingAgents](./tool_calling_agents)
43 |
44 | `ToolCallingAgents` are the second type of agent supported by `smolagents`. Unlike `CodeAgents`, which generate Python code, these agents rely on JSON/text blobs that the system must parse and interpret to execute actions. This module covers their functionality, their key differences from `CodeAgents`, and it provides an example to illustrate their usage.
45 |
46 | ### 4️⃣ [Tools](./tools)
47 |
48 | As we saw in Unit 1, tools are functions that an LLM can use within an agentic system, and they act as the essential building blocks for agent behavior. This module covers how to create tools, their structure, and different implementation methods using the `Tool` class or the `@tool` decorator. You'll also learn about the default toolbox, how to share tools with the community, and how to load community-contributed tools for use in your agents.
49 |
50 | ### 5️⃣ [Retrieval Agents](./retrieval_agents)
51 |
52 | Retrieval agents allow models access to knowledge bases, making it possible to search, synthesize, and retrieve information from multiple sources. They leverage vector stores for efficient retrieval and implement **Retrieval-Augmented Generation (RAG)** patterns. These agents are particularly useful for integrating web search with custom knowledge bases while maintaining conversation context through memory systems. This module explores implementation strategies, including fallback mechanisms for robust information retrieval.
53 |
54 | ### 6️⃣ [Multi-Agent Systems](./multi_agent_systems)
55 |
56 | Orchestrating multiple agents effectively is crucial for building powerful, multi-agent systems. By combining agents with different capabilities—such as a web search agent with a code execution agent—you can create more sophisticated solutions. This module focuses on designing, implementing, and managing multi-agent systems to maximize efficiency and reliability.
57 |
58 | ### 7️⃣ [Vision and Browser agents](./vision_agents)
59 |
60 | Vision agents extend traditional agent capabilities by incorporating **Vision-Language Models (VLMs)**, enabling them to process and interpret visual information. This module explores how to design and integrate VLM-powered agents, unlocking advanced functionalities like image-based reasoning, visual data analysis, and multimodal interactions. We will also use vision agents to build a browser agent that can browse the web and extract information from it.
61 |
62 | ## Resources
63 |
64 | - [smolagents Documentation](https://huggingface.co/docs/smolagents) - Official docs for the smolagents library
65 | - [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) - Research paper on agent architectures
66 | - [Agent Guidelines](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) - Best practices for building reliable agents
67 | - [LangGraph Agents](https://langchain-ai.github.io/langgraph/) - Additional examples of agent implementations
68 | - [Function Calling Guide](https://platform.openai.com/docs/guides/function-calling) - Understanding function calling in LLMs
69 | - [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/) - Guide to implementing effective RAG
70 |
--------------------------------------------------------------------------------
/units/en/unit2/smolagents/quiz1.mdx:
--------------------------------------------------------------------------------
1 | # Small Quiz (ungraded) [[quiz1]]
2 |
3 | Let's test your understanding of `smolagents` with a quick quiz! Remember, testing yourself helps reinforce learning and identify areas that may need review.
4 |
5 | This is an optional quiz and it's not graded.
6 |
7 | ### Q1: What is one of the primary advantages of choosing `smolagents` over other frameworks?
8 | Which statement best captures a core strength of the `smolagents` approach?
9 |
10 | <Question
11 | choices={[
12 | {
13 | text: "It uses highly specialized configuration files and a steep learning curve to ensure only expert developers can use it",
14 | explain: "smolagents is designed for simplicity and minimal code complexity, not steep learning curves.",
15 | },
16 | {
17 | text: "It supports a code-first approach with minimal abstractions, letting agents interact directly via Python function calls",
18 | explain: "Yes, smolagents emphasizes a straightforward, code-centric design with minimal abstractions.",
19 | correct: true
20 | },
21 | {
22 | text: "It focuses on JSON-based actions, removing the need for agents to write any code",
23 | explain: "While smolagents supports JSON-based tool calls (ToolCallingAgents), the library emphasizes code-based approaches with CodeAgents.",
24 | },
25 | {
26 | text: "It deeply integrates with a single LLM provider and specialized hardware",
27 | explain: "smolagents supports multiple model providers and does not require specialized hardware.",
28 | }
29 | ]}
30 | />
31 |
32 | ---
33 |
34 | ### Q2: In which scenario would you likely benefit most from using smolagents?
35 | Which situation aligns well with what smolagents does best?
36 |
37 | <Question
38 | choices={[
39 | {
40 | text: "Prototyping or experimenting quickly with agent logic, particularly when your application is relatively straightforward",
41 | explain: "Yes. smolagents is designed for simple and nimble agent creation without extensive setup overhead.",
42 | correct: true
43 | },
44 | {
45 | text: "Building a large-scale enterprise system where you need dozens of microservices and real-time data pipelines",
46 | explain: "While possible, smolagents is more focused on lightweight, code-centric experimentation rather than heavy enterprise infrastructure.",
47 | },
48 | {
49 | text: "Needing a framework that only supports cloud-based LLMs and forbids local inference",
50 | explain: "smolagents offers flexible integration with local or hosted models, not exclusively cloud-based LLMs.",
51 | },
52 | {
53 | text: "A scenario that requires advanced orchestration, multi-modal perception, and enterprise-scale features out-of-the-box",
54 | explain: "While you can integrate advanced capabilities, smolagents itself is lightweight and minimal at its core.",
55 | }
56 | ]}
57 | />
58 |
59 | ---
60 |
61 | ### Q3: smolagents offers flexibility in model integration. Which statement best reflects its approach?
62 | Choose the most accurate description of how smolagents interoperates with LLMs.
63 |
64 | <Question
65 | choices={[
66 | {
67 | text: "It only provides a single built-in model and does not allow custom integrations",
68 | explain: "smolagents supports multiple different backends and user-defined models.",
69 | },
70 | {
71 | text: "It requires you to implement your own model connector for every LLM usage",
72 | explain: "There are multiple prebuilt connectors that make LLM integration straightforward.",
73 | },
74 | {
75 | text: "It only integrates with open-source LLMs but not commercial APIs",
76 | explain: "smolagents can integrate with both open-source and commercial model APIs.",
77 | },
78 | {
79 | text: "It can be used with a wide range of LLMs, offering predefined classes like TransformersModel, HfApiModel, and LiteLLMModel",
80 | explain: "This is correct. smolagents supports flexible model integration through various classes.",
81 | correct: true
82 | }
83 | ]}
84 | />
85 |
86 | ---
87 |
88 | ### Q4: How does smolagents handle the debate between code-based actions and JSON-based actions?
89 | Which statement correctly characterizes smolagents' philosophy about action formats?
90 |
91 | <Question
92 | choices={[
93 | {
94 | text: "It only allows JSON-based actions for all agent tasks, requiring a parser to extract the tool calls",
95 | explain: "ToolCallingAgent uses JSON-based calls, but smolagents also provides a primary CodeAgent option that writes Python code.",
96 | },
97 | {
98 | text: "It focuses on code-based actions via a CodeAgent but also supports JSON-based tool calls with a ToolCallingAgent",
99 | explain: "Yes, smolagents primarily recommends code-based actions but includes a JSON-based alternative for users who prefer it or need it.",
100 | correct: true
101 | },
102 | {
103 | text: "It disallows any external function calls, instead requiring all logic to reside entirely within the LLM",
104 | explain: "smolagents is specifically designed to grant LLMs the ability to call tools or code externally.",
105 | },
106 | {
107 | text: "It requires users to manually convert every code snippet into a JSON object before running the agent",
108 | explain: "smolagents can automatically manage code snippet creation within the CodeAgent path, no manual JSON conversion necessary.",
109 | }
110 | ]}
111 | />
112 |
113 | ---
114 |
115 | ### Q5: How does smolagents integrate with the Hugging Face Hub for added benefits?
116 | Which statement accurately describes one of the core advantages of Hub integration?
117 |
118 | <Question
119 | choices={[
120 | {
121 | text: "It automatically upgrades all public models to commercial license tiers",
122 | explain: "Hub integration doesn't change the license tier for models or tools.",
123 | },
124 | {
125 | text: "It disables local inference entirely, forcing remote model usage only",
126 | explain: "Users can still do local inference if they prefer; pushing to the Hub doesn't override local usage.",
127 | },
128 | {
129 | text: "It allows you to push and share agents or tools, making them easily discoverable and reusable by other developers",
130 | explain: "smolagents supports uploading agents and tools to the HF Hub for others to reuse.",
131 | correct: true
132 | },
133 | {
134 | text: "It permanently stores all your code-based agents, preventing any updates or versioning",
135 | explain: "Hub repositories support updates and version control, so you can revise your code-based agents any time.",
136 | }
137 | ]}
138 | />
139 |
140 | ---
141 |
142 | Congratulations on completing this quiz! 🎉 If you missed any questions, consider reviewing the *Why use smolagents* section for a deeper understanding. If you did well, you're ready to explore more advanced topics in smolagents!
143 |
--------------------------------------------------------------------------------
/units/en/unit2/smolagents/quiz2.mdx:
--------------------------------------------------------------------------------
1 | # Small Quiz (ungraded) [[quiz2]]
2 |
3 | It's time to test your understanding of the *Code Agents*, *Tool Calling Agents*, and *Tools* sections. This quiz is optional and not graded.
4 |
5 | ---
6 |
7 | ### Q1: What is the key difference between creating a tool with the `@tool` decorator versus creating a subclass of `Tool` in smolagents?
8 |
9 | Which statement best describes the distinction between these two approaches for defining tools?
10 |
11 | <Question
12 | choices={[
13 | {
14 | text: "Using the <code>@tool</code> decorator is mandatory for retrieval-based tools, while subclasses of <code>Tool</code> are only for text-generation tasks",
15 | explain: "Both approaches can be used for any type of tool, including retrieval-based or text-generation tools.",
16 | },
17 | {
18 | text: "The <code>@tool</code> decorator is recommended for simple function-based tools, while subclasses of <code>Tool</code> offer more flexibility for complex functionality or custom metadata",
19 | explain: "This is correct. The decorator approach is simpler, but subclassing allows more customized behavior.",
20 | correct: true
21 | },
22 | {
23 | text: "<code>@tool</code> can only be used in multi-agent systems, while creating a <code>Tool</code> subclass is for single-agent scenarios",
24 | explain: "All agents (single or multi) can use either approach to define tools; there is no such restriction.",
25 | },
26 | {
27 | text: "Decorating a function with <code>@tool</code> replaces the need for a docstring, whereas subclasses must not include docstrings",
28 | explain: "Both methods benefit from clear docstrings. The decorator doesn't replace them, and a subclass can still have docstrings.",
29 | }
30 | ]}
31 | />
32 |
33 | ---
34 |
35 | ### Q2: How does a CodeAgent handle multi-step tasks using the ReAct (Reason + Act) approach?
36 |
37 | Which statement correctly describes how the CodeAgent executes a series of steps to solve a task?
38 |
39 | <Question
40 | choices={[
41 | {
42 | text: "It passes each step to a different agent in a multi-agent system, then combines results",
43 | explain: "Although multi-agent systems can distribute tasks, CodeAgent itself can handle multiple steps on its own using ReAct.",
44 | },
45 | {
46 | text: "It stores every action in JSON for easy parsing before executing them all at once",
47 | explain: "This behavior matches ToolCallingAgent's JSON-based approach, not CodeAgent.",
48 | },
49 | {
50 | text: "It cycles through writing internal thoughts, generating Python code, executing the code, and logging the results until it arrives at a final answer",
51 | explain: "Correct. This describes the ReAct pattern that CodeAgent uses, including iterative reasoning and code execution.",
52 | correct: true
53 | },
54 | {
55 | text: "It relies on a vision module to validate code output before continuing to the next step",
56 | explain: "Vision capabilities are supported in smolagents, but they're not a default requirement for CodeAgent or the ReAct approach.",
57 | }
58 | ]}
59 | />
60 |
61 | ---
62 |
63 | ### Q3: Which of the following is a primary advantage of sharing a tool on the Hugging Face Hub?
64 |
65 | Select the best reason why a developer might upload and share their custom tool.
66 |
67 | <Question
68 | choices={[
69 | {
70 | text: "It automatically integrates the tool with a MultiStepAgent for retrieval-augmented generation",
71 | explain: "Sharing a tool doesn't automatically set up retrieval or multi-step logic. It's just making the tool available.",
72 | },
73 | {
74 | text: "It allows others to discover, reuse, and integrate your tool in their smolagents without extra setup",
75 | explain: "Yes. Sharing on the Hub makes tools accessible for anyone (including yourself) to download and reuse quickly.",
76 | correct: true
77 | },
78 | {
79 | text: "It ensures that only CodeAgents can invoke the tool while ToolCallingAgents cannot",
80 | explain: "Both CodeAgents and ToolCallingAgents can invoke shared tools. There's no restriction by agent type.",
81 | },
82 | {
83 | text: "It converts your tool into a fully vision-capable function for image processing",
84 | explain: "Tool sharing doesn't alter the tool's functionality or add vision capabilities automatically.",
85 | }
86 | ]}
87 | />
88 |
89 | ---
90 |
91 | ### Q4: ToolCallingAgent differs from CodeAgent in how it executes actions. Which statement is correct?
92 |
93 | Choose the option that accurately describes how ToolCallingAgent works.
94 |
95 | <Question
96 | choices={[
97 | {
98 | text: "ToolCallingAgent is only compatible with a multi-agent system, while CodeAgent can run alone",
99 | explain: "Either agent can be used alone or as part of a multi-agent system.",
100 | },
101 | {
102 | text: "ToolCallingAgent delegates all reasoning to a separate retrieval agent, then returns a final answer",
103 | explain: "ToolCallingAgent still uses a main LLM for reasoning; it doesn't rely solely on retrieval agents.",
104 | },
105 | {
106 | text: "ToolCallingAgent outputs JSON instructions specifying tool calls and arguments, which get parsed and executed",
107 | explain: "This is correct. ToolCallingAgent uses the JSON approach to define tool calls.",
108 | correct: true
109 | },
110 | {
111 | text: "ToolCallingAgent is only meant for single-step tasks and automatically stops after calling one tool",
112 | explain: "ToolCallingAgent can perform multiple steps if needed, just like CodeAgent.",
113 | }
114 | ]}
115 | />
116 |
117 | ---
118 |
119 | ### Q5: What is included in the smolagents default toolbox, and why might you use it?
120 |
121 | Which statement best captures the purpose and contents of the default toolbox in smolagents?
122 |
123 | <Question
124 | choices={[
125 | {
126 | text: "It provides a set of commonly-used tools such as DuckDuckGo search, PythonInterpreterTool, and a final answer tool for quick prototyping",
127 | explain: "Correct. The default toolbox contains these ready-made tools for easy integration when building agents.",
128 | correct: true
129 | },
130 | {
131 | text: "It only supports vision-based tasks like image classification or OCR by default",
132 | explain: "Although smolagents can integrate vision-based features, the default toolbox isn't exclusively vision-oriented.",
133 | },
134 | {
135 | text: "It is intended solely for multi-agent systems and is incompatible with a single CodeAgent",
136 | explain: "The default toolbox can be used by any agent type, single or multi-agent setups alike.",
137 | },
138 | {
139 | text: "It adds advanced retrieval-based functionality for large-scale question answering from a vector store",
140 | explain: "While you can build retrieval tools, the default toolbox does not automatically provide advanced RAG features.",
141 | }
142 | ]}
143 | />
144 |
145 | ---
146 |
147 | Congratulations on completing this quiz! 🎉 If any questions gave you trouble, revisit the *Code Agents*, *Tool Calling Agents*, or *Tools* sections to strengthen your understanding. If you aced it, you're well on your way to building robust smolagents applications!
148 |
--------------------------------------------------------------------------------
/units/en/unit2/smolagents/retrieval_agents.mdx:
--------------------------------------------------------------------------------
1 | <CourseFloatingBanner chapter={2}
2 | classNames="absolute z-10 right-0 top-0"
3 | notebooks={[
4 | {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/retrieval_agents.ipynb"},
5 | ]} />
6 |
7 | # Building Agentic RAG Systems
8 |
9 | <Tip>
10 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/retrieval_agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
11 | </Tip>
12 |
13 | Retrieval Augmented Generation (RAG) systems combine the capabilities of data retrieval and generation models to provide context-aware responses. For example, a user's query is passed to a search engine, and the retrieved results are given to the model along with the query. The model then generates a response based on the query and retrieved information.
14 |
15 | Agentic RAG (Retrieval-Augmented Generation) extends traditional RAG systems by **combining autonomous agents with dynamic knowledge retrieval**.
16 |
17 | While traditional RAG systems use an LLM to answer queries based on retrieved data, agentic RAG **enables intelligent control of both retrieval and generation processes**, improving efficiency and accuracy.
18 |
19 | Traditional RAG systems face key limitations, such as **relying on a single retrieval step** and focusing on direct semantic similarity with the user’s query, which may overlook relevant information.
20 |
21 | Agentic RAG addresses these issues by allowing the agent to autonomously formulate search queries, critique retrieved results, and conduct multiple retrieval steps for a more tailored and comprehensive output.
22 |
23 | ## Basic Retrieval with DuckDuckGo
24 |
25 | Let's build a simple agent that can search the web using DuckDuckGo. This agent will retrieve information and synthesize responses to answer queries. With Agentic RAG, Alfred's agent can:
26 |
27 | * Search for latest superhero party trends
28 | * Refine results to include luxury elements
29 | * Synthesize information into a complete plan
30 |
31 | Here's how Alfred's agent can achieve this:
32 |
33 | ```python
34 | from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
35 |
36 | # Initialize the search tool
37 | search_tool = DuckDuckGoSearchTool()
38 |
39 | # Initialize the model
40 | model = HfApiModel()
41 |
42 | agent = CodeAgent(
43 | model=model,
44 | tools=[search_tool],
45 | )
46 |
47 | # Example usage
48 | response = agent.run(
49 | "Search for luxury superhero-themed party ideas, including decorations, entertainment, and catering."
50 | )
51 | print(response)
52 | ```
53 |
54 | The agent follows this process:
55 |
56 | 1. **Analyzes the Request:** Alfred’s agent identifies the key elements of the query—luxury superhero-themed party planning, with focus on decor, entertainment, and catering.
57 | 2. **Performs Retrieval:** The agent leverages DuckDuckGo to search for the most relevant and up-to-date information, ensuring it aligns with Alfred’s refined preferences for a luxurious event.
58 | 3. **Synthesizes Information:** After gathering the results, the agent processes them into a cohesive, actionable plan for Alfred, covering all aspects of the party.
59 | 4. **Stores for Future Reference:** The agent stores the retrieved information for easy access when planning future events, optimizing efficiency in subsequent tasks.
60 |
61 | ## Custom Knowledge Base Tool
62 |
63 | For specialized tasks, a custom knowledge base can be invaluable. Let's create a tool that queries a vector database of technical documentation or specialized knowledge. Using semantic search, the agent can find the most relevant information for Alfred's needs.
64 |
65 | A vector database stores numerical representations (embeddings) of text or other data, created by machine learning models. It enables semantic search by identifying similar meanings in high-dimensional space.
66 |
67 | This approach combines predefined knowledge with semantic search to provide context-aware solutions for event planning. With specialized knowledge access, Alfred can perfect every detail of the party.
68 |
69 | In this example, we'll create a tool that retrieves party planning ideas from a custom knowledge base. We'll use a BM25 retriever to search the knowledge base and return the top results, and `RecursiveCharacterTextSplitter` to split the documents into smaller chunks for more efficient search.
70 |
71 | ```python
72 | from langchain.docstore.document import Document
73 | from langchain.text_splitter import RecursiveCharacterTextSplitter
74 | from smolagents import Tool
75 | from langchain_community.retrievers import BM25Retriever
76 | from smolagents import CodeAgent, HfApiModel
77 |
78 | class PartyPlanningRetrieverTool(Tool):
79 | name = "party_planning_retriever"
80 | description = "Uses semantic search to retrieve relevant party planning ideas for Alfred’s superhero-themed party at Wayne Manor."
81 | inputs = {
82 | "query": {
83 | "type": "string",
84 | "description": "The query to perform. This should be a query related to party planning or superhero themes.",
85 | }
86 | }
87 | output_type = "string"
88 |
89 | def __init__(self, docs, **kwargs):
90 | super().__init__(**kwargs)
91 | self.retriever = BM25Retriever.from_documents(
92 | docs, k=5 # Retrieve the top 5 documents
93 | )
94 |
95 | def forward(self, query: str) -> str:
96 | assert isinstance(query, str), "Your search query must be a string"
97 |
98 | docs = self.retriever.invoke(
99 | query,
100 | )
101 | return "\nRetrieved ideas:\n" + "".join(
102 | [
103 | f"\n\n===== Idea {str(i)} =====\n" + doc.page_content
104 | for i, doc in enumerate(docs)
105 | ]
106 | )
107 |
108 | # Simulate a knowledge base about party planning
109 | party_ideas = [
110 | {"text": "A superhero-themed masquerade ball with luxury decor, including gold accents and velvet curtains.", "source": "Party Ideas 1"},
111 | {"text": "Hire a professional DJ who can play themed music for superheroes like Batman and Wonder Woman.", "source": "Entertainment Ideas"},
112 | {"text": "For catering, serve dishes named after superheroes, like 'The Hulk's Green Smoothie' and 'Iron Man's Power Steak.'", "source": "Catering Ideas"},
113 | {"text": "Decorate with iconic superhero logos and projections of Gotham and other superhero cities around the venue.", "source": "Decoration Ideas"},
114 | {"text": "Interactive experiences with VR where guests can engage in superhero simulations or compete in themed games.", "source": "Entertainment Ideas"}
115 | ]
116 |
117 | source_docs = [
118 | Document(page_content=doc["text"], metadata={"source": doc["source"]})
119 | for doc in party_ideas
120 | ]
121 |
122 | # Split the documents into smaller chunks for more efficient search
123 | text_splitter = RecursiveCharacterTextSplitter(
124 | chunk_size=500,
125 | chunk_overlap=50,
126 | add_start_index=True,
127 | strip_whitespace=True,
128 | separators=["\n\n", "\n", ".", " ", ""],
129 | )
130 | docs_processed = text_splitter.split_documents(source_docs)
131 |
132 | # Create the retriever tool
133 | party_planning_retriever = PartyPlanningRetrieverTool(docs_processed)
134 |
135 | # Initialize the agent
136 | agent = CodeAgent(tools=[party_planning_retriever], model=HfApiModel())
137 |
138 | # Example usage
139 | response = agent.run(
140 | "Find ideas for a luxury superhero-themed party, including entertainment, catering, and decoration options."
141 | )
142 |
143 | print(response)
144 | ```
145 |
146 | This enhanced agent can:
147 | 1. First check the documentation for relevant information
148 | 2. Combine insights from the knowledge base
149 | 3. Maintain conversation context in memory
150 |
151 | ## Enhanced Retrieval Capabilities
152 |
153 | When building agentic RAG systems, the agent can employ sophisticated strategies like:
154 |
155 | 1. **Query Reformulation:** Instead of using the raw user query, the agent can craft optimized search terms that better match the target documents
156 | 2. **Multi-Step Retrieval:** The agent can perform multiple searches, using initial results to inform subsequent queries
157 | 3. **Source Integration:** Information can be combined from multiple sources like web search and local documentation
158 | 4. **Result Validation:** Retrieved content can be analyzed for relevance and accuracy before being included in responses
159 |
160 | Effective agentic RAG systems require careful consideration of several key aspects. The agent **should select between available tools based on the query type and context**. Memory systems help maintain conversation history and avoid repetitive retrievals. Having fallback strategies ensures the system can still provide value even when primary retrieval methods fail. Additionally, implementing validation steps helps ensure the accuracy and relevance of retrieved information.
161 |
162 | ## Resources
163 |
164 | - [Agentic RAG: turbocharge your RAG with query reformulation and self-query! 🚀](https://huggingface.co/learn/cookbook/agent_rag) - Recipe for developing an Agentic RAG system using smolagents.
165 |
--------------------------------------------------------------------------------
/units/en/unit2/smolagents/tool_calling_agents.mdx:
--------------------------------------------------------------------------------
1 | <CourseFloatingBanner chapter={2}
2 | classNames="absolute z-10 right-0 top-0"
3 | notebooks={[
4 | {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/tool_calling_agents.ipynb"},
5 | ]} />
6 |
7 | # Writing actions as code snippets or JSON blobs
8 |
9 | <Tip>
10 | You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/tool_calling_agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
11 | </Tip>
12 |
13 | Tool Calling Agents are the second type of agent available in `smolagents`. Unlike Code Agents that use Python snippets, these agents **use the built-in tool-calling capabilities of LLM providers** to generate tool calls as **JSON structures**. This is the standard approach used by OpenAI, Anthropic, and many other providers.
14 |
15 | Let's look at an example. When Alfred wants to search for catering services and party ideas, a `CodeAgent` would generate and run Python code like this:
16 |
17 | ```python
18 | for query in [
19 | "Best catering services in Gotham City",
20 | "Party theme ideas for superheroes"
21 | ]:
22 | print(web_search(f"Search for: {query}"))
23 | ```
24 |
25 | A `ToolCallingAgent` would instead create a JSON structure:
26 |
27 | ```python
28 | [
29 | {"name": "web_search", "arguments": "Best catering services in Gotham City"},
30 | {"name": "web_search", "arguments": "Party theme ideas for superheroes"}
31 | ]
32 | ```
33 |
34 | This JSON blob is then used to execute the tool calls.
35 |
36 | While `smolagents` primarily focuses on `CodeAgents` since [they perform better overall](https://arxiv.org/abs/2402.01030), `ToolCallingAgents` can be effective for simple systems that don't require variable handling or complex tool calls.
37 |
38 | ![Code vs JSON Actions](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/code_vs_json_actions.png)
39 |
40 | ## How Do Tool Calling Agents Work?
41 |
42 | Tool Calling Agents follow the same multi-step workflow as Code Agents (see the [previous section](./code_agents) for details).
43 |
44 | The key difference is in **how they structure their actions**: instead of executable code, they **generate JSON objects that specify tool names and arguments**. The system then **parses these instructions** to execute the appropriate tools.
45 |
46 | ## Example: Running a Tool Calling Agent
47 |
48 | Let's revisit the previous example where Alfred started party preparations, but this time we'll use a `ToolCallingAgent` to highlight the difference. We'll build an agent that can search the web using DuckDuckGo, just like in our Code Agent example. The only difference is the agent type - the framework handles everything else:
49 |
50 | ```python
51 | from smolagents import ToolCallingAgent, DuckDuckGoSearchTool, HfApiModel
52 |
53 | agent = ToolCallingAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())
54 |
55 | agent.run("Search for the best music recommendations for a party at the Wayne's mansion.")
56 | ```
57 |
58 | When you examine the agent's trace, instead of seeing `Executing parsed code:`, you'll see something like:
59 |
60 | ```text
61 | ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
62 | │ Calling tool: 'web_search' with arguments: {'query': "best music recommendations for a party at Wayne's │
63 | │ mansion"} │
64 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
65 | ```
66 |
67 | The agent generates a structured tool call that the system processes to produce the output, rather than directly executing code like a `CodeAgent`.
68 |
69 | Now that we understand both agent types, we can choose the right one for our needs. Let's continue exploring `smolagents` to make Alfred's party a success! 🎉
70 |
71 | ## Resources
72 |
73 | - [ToolCallingAgent documentation](https://huggingface.co/docs/smolagents/v1.8.1/en/reference/agents#smolagents.ToolCallingAgent) - Official documentation for ToolCallingAgent
74 |
--------------------------------------------------------------------------------
/units/en/unit2/smolagents/why_use_smolagents.mdx:
--------------------------------------------------------------------------------
1 | ![smolagents banner](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolagents/license_to_call.png)
2 | # Why use smolagents
3 |
4 | In this module, we will explore the pros and cons of using [smolagents](https://huggingface.co/docs/smolagents/en/index), helping you make an informed decision about whether it's the right framework for your needs.
5 |
6 | ## What is `smolagents`?
7 |
8 | `smolagents` is a simple yet powerful framework for building AI agents. It provides LLMs with the _agency_ to interact with the real world, such as searching or generating images.
9 |
10 | As we learned in unit 1, AI agents are programs that use LLMs to generate **'thoughts'** based on **'observations'** to perform **'actions'**. Let's explore how this is implemented in smolagents.
11 |
12 | ### Key Advantages of `smolagents`
13 | - **Simplicity:** Minimal code complexity and abstractions, to make the framework easy to understand, adopt and extend
14 | - **Flexible LLM Support:** Works with any LLM through integration with Hugging Face tools and external APIs
15 | - **Code-First Approach:** First-class support for Code Agents that write their actions directly in code, removing the need for parsing and simplifying tool calling
16 | - **HF Hub Integration:** Seamless integration with the Hugging Face Hub, allowing the use of Gradio Spaces as tools
17 |
18 | ### When to use smolagents?
19 |
20 | With these advantages in mind, when should we use smolagents over other frameworks?
21 |
22 | smolagents is ideal when:
23 | - You need a **lightweight and minimal solution.**
24 | - You want to **experiment quickly** without complex configurations.
25 | - Your **application logic is straightforward.**
26 |
27 | ### Code vs. JSON Actions
28 | Unlike other frameworks where agents write actions in JSON, `smolagents` **focuses on tool calls in code**, simplifying the execution process. This is because there's no need to parse the JSON in order to build code that calls the tools: the output can be executed directly.
29 |
30 | The following diagram illustrates this difference:
31 |
32 | ![Code vs. JSON actions](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/code_vs_json_actions.png)
33 |
34 | To review the difference between Code vs JSON Actions, you can revisit [the Actions Section in Unit 1](https://huggingface.co/learn/agents-course/unit1/actions#actions-enabling-the-agent-to-engage-with-its-environment).
35 |
36 | ### Agent Types in `smolagents`
37 |
38 | Agents in `smolagents` operate as **multi-step agents**.
39 |
40 | Each [`MultiStepAgent`](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.MultiStepAgent) performs:
41 | - One thought
42 | - One tool call and execution
43 |
44 | In addition to using **[CodeAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.CodeAgent)** as the primary type of agent, smolagents also supports **[ToolCallingAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.ToolCallingAgent)**, which writes tool calls in JSON.
45 |
46 | We will explore each agent type in more detail in the following sections.
47 |
48 | <Tip>
49 | In smolagents, tools are defined using <code>@tool</code> decorator wrapping a Python function or the <code>Tool</code> class.
50 | </Tip>
51 |
52 | ### Model Integration in `smolagents`
53 | `smolagents` supports flexible LLM integration, allowing you to use any callable model that meets [certain criteria](https://huggingface.co/docs/smolagents/main/en/reference/models). The framework provides several predefined classes to simplify model connections:
54 |
55 | - **[TransformersModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.TransformersModel):** Implements a local `transformers` pipeline for seamless integration.
56 | - **[HfApiModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.HfApiModel):** Supports [serverless inference](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) calls through [Hugging Face's infrastructure](https://huggingface.co/docs/api-inference/index), or via a growing number of [third-party inference providers](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#supported-providers-and-tasks).
57 | - **[LiteLLMModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.LiteLLMModel):** Leverages [LiteLLM](https://www.litellm.ai/) for lightweight model interactions.
58 | - **[OpenAIServerModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.OpenAIServerModel):** Connects to any service that offers an OpenAI API interface.
59 | - **[AzureOpenAIServerModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.AzureOpenAIServerModel):** Supports integration with any Azure OpenAI deployment.
60 |
61 | This flexibility ensures that developers can choose the model and service most suitable for their specific use cases, and allows for easy experimentation.
62 |
63 | Now that we understood why and when to use smolagents, let's dive deeper into this powerful library!
64 |
65 | ## Resources
66 |
67 | - [smolagents Blog](https://huggingface.co/blog/smolagents) - Introduction to smolagents and code interactions
68 |
--------------------------------------------------------------------------------
/units/en/unit3/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huggingface/agents-course/main/units/en/unit3/README.md
--------------------------------------------------------------------------------
/units/en/unit3/agentic-rag/agentic-rag.mdx:
--------------------------------------------------------------------------------
1 | # Agentic Retrieval Augmented Generation (RAG)
2 |
3 | In this unit, we'll be taking a look at how we can use Agentic RAG to help Alfred prepare for the amazing gala.
4 |
5 | <Tip>We know we've already discussed Retrieval Augmented Generation (RAG) and agentic RAG in the previous unit, so feel free to skip ahead if you're already familiar with the concepts.</Tip>
6 |
7 | LLMs are trained on enormous bodies of data to learn general knowledge.
8 | However, the world knowledge model of LLMs may not always be relevant and up-to-date information.
9 | **RAG solves this problem by finding and retrieving relevant information from your data and forwarding that to the LLM.**
10 |
11 | ![RAG](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/rag.png)
12 |
13 | Now, think about how Alfred works:
14 |
15 | 1. We've asked Alfred to help plan a gala
16 | 2. Alfred needs to find the latest news and weather information
17 | 3. Alfred needs to structure and search the guest information
18 |
19 | Just as Alfred needs to search through your household information to be helpful, any agent needs a way to find and understand relevant data.
20 | **Agentic RAG is a powerful way to use agents to answer questions about your data.** We can pass various tools to Alfred to help him answer questions.
21 | However, instead of answering the question on top of documents automatically, Alfred can decide to use any other tool or flow to answer the question.
22 |
23 | ![Agentic RAG](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/agentic-rag.png)
24 |
25 | Let's start **building our agentic RAG workflow!**
26 |
27 | First, we'll create a RAG tool to retrieve up-to-date details about the invitees. Next, we'll develop tools for web search, weather updates, and Hugging Face Hub model download statistics. Finally, we'll integrate everything to bring our agentic RAG agent to life!
28 |
--------------------------------------------------------------------------------
/units/en/unit3/agentic-rag/conclusion.mdx:
--------------------------------------------------------------------------------
1 | # Conclusion
2 |
3 | In this unit, we've learned how to create an agentic RAG system to help Alfred, our friendly neighborhood agent, prepare for and manage an extravagant gala.
4 |
5 | The combination of RAG with agentic capabilities demonstrates how powerful AI assistants can become when they have:
6 | - Access to structured knowledge (guest information)
7 | - Ability to retrieve real-time information (web search)
8 | - Domain-specific tools (weather information, Hub stats)
9 | - Memory of past interactions
10 |
11 | With these capabilities, Alfred is now well-equipped to be the perfect host, able to answer questions about guests, provide up-to-date information, and ensure the gala runs smoothly—even managing the perfect timing for the fireworks display!
12 |
13 | <Tip>
14 | Now that you've built a complete agent, you might want to explore:
15 |
16 | - Creating more specialized tools for your own use cases
17 | - Implementing more sophisticated RAG systems with embeddings
18 | - Building multi-agent systems where agents can collaborate
19 | - Deploying your agent as a service that others can interact with
20 |
21 | </Tip>
22 |
--------------------------------------------------------------------------------
/units/en/unit3/agentic-rag/introduction.mdx:
--------------------------------------------------------------------------------
1 | # Introduction to Use Case for Agentic RAG
2 |
3 | ![Agentic RAG banner](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit3/agentic-rag/thumbnail.jpg)
4 |
5 | In this unit, we will help Alfred, our friendly agent who is hosting the gala, by using Agentic RAG to create a tool that can be used to answer questions about the guests at the gala.
6 |
7 | <Tip>
8 | This is a 'real-world' use case for Agentic RAG, that you could use in your own projects or workplaces. If you want to get more out of this project, why not try it out on your own use case and share in Discord?
9 | </Tip>
10 |
11 |
12 | You can choose any of the frameworks discussed in the course for this use case. We provide code samples for each in separate tabs.
13 |
14 | ## A Gala to Remember
15 |
16 | Now, it's time to get our hands dirty with an actual use case. Let's set the stage!
17 |
18 | **You decided to host the most extravagant and opulent party of the century.** This means lavish feasts, enchanting dancers, renowned DJs, exquisite drinks, a breathtaking fireworks display, and much more.
19 |
20 | Alfred, your friendly neighbourhood agent, is getting ready to watch over all of your needs for this party, and **Alfred is going to manage everything himself**. To do so, he needs to have access to all of the information about the party, including the menu, the guests, the schedule, weather forecasts, and much more!
21 |
22 | Not only that, but he also needs to make sure that the party is going to be a success, so **he needs to be able to answer any questions about the party during the party**, whilst handling unexpected situations that may arise.
23 |
24 | He can't do this alone, so we need to make sure that Alfred has access to all of the information and tools he needs.
25 |
26 | First, let's give him a list of hard requirements for the gala.
27 |
28 | ## The Gala Requirements
29 |
30 | A properly educated person in the age of the **Renaissance** needs to have three main traits.
31 | He or she needed to be profound in the **knowledge of sports, culture, and science**. So, we need to make sure we can impress our guests with our knowledge and provide them with a truly unforgettable gala.
32 | However, to avoid any conflicts, there are some **topics, like politics and religion, that are to be avoided at a gala.** It needs to be a fun party without conflicts related to beliefs and ideals.
33 |
34 | According to etiquette, **a good host should be aware of guests' backgrounds**, including their interests and endeavours. A good host also gossips and shares stories about the guests with one another.
35 |
36 | Lastly, we need to make sure that we've got **some general knowledge about the weather** to ensure we can continuously find a real-time update to ensure perfect timing to launch the fireworks and end the gala with a bang! 🎆
37 |
38 | As you can see, Alfred needs a lot of information to host the gala.
39 | Luckily, we can help and prepare Alfred by giving him some **Retrieval Augmented Generation (RAG) training!**
40 |
41 | Let's start by creating the tools that Alfred needs to be able to host the gala!
42 |
--------------------------------------------------------------------------------
/units/en/unit4/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huggingface/agents-course/main/units/en/unit4/README.md
--------------------------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment