mcvaha · April 26, 2025 22:31
diff --git a/Final Assignment b/Final Assignment
 └── units
    └── en
        └── unit4
            ├── additional-readings.mdx
            ├── conclusion.mdx
            ├── get-your-certificate.mdx
            ├── hands-on.mdx
            ├── introduction.mdx
            └── what-is-gaia.mdx


 /units/en/unit4/additional-readings.mdx:
 --------------------------------------------------------------------------------
 1 | # And now? What topics I should learn?
 2 | 
 3 | Agentic AI is a rapidly evolving field, and understanding foundational protocols is essential for building intelligent, autonomous systems. 
 4 | 
 5 | Two important standards you should get familiar with are:
 6 | 
 7 | - The **Model Context Protocol (MCP)**  
 8 | - The **Agent-to-Agent Protocol (A2A)**
 9 | 
 10 | ## 🔌 Model Context Protocol (MCP)
 11 | 
 12 | The **Model Context Protocol (MCP)** by Anthropic is an open standard that enables AI models to securely and seamlessly **connect with external tools, data sources, and applications**, making agents more capable and autonomous.
 13 | 
 14 | Think of MCP as a **universal adapter**, like a USB-C port, that allows AI models to plug into various digital environments **without needing custom integration for each one**.
 15 | 
 16 | MCP is quickly gaining traction across the industry, with major companies like OpenAI and Google beginning to adopt it. 
 17 | 
 18 | 📚 Learn more:
 19 | - [Anthropic's official announcement and documentation](https://www.anthropic.com/news/model-context-protocol)
 20 | - [MCP on Wikipedia](https://en.wikipedia.org/wiki/Model_Context_Protocol)
 21 | - [Blog on MCP](https://huggingface.co/blog/Kseniase/mcp)
 22 | 
 23 | ## 🤝 Agent-to-Agent (A2A) Protocol
 24 | 
 25 | Google has developed the **Agent-to-Agent (A2A) protocol** as a complementary counterpart to Anthropic's Model Context Protocol (MCP).
 26 | 
 27 | While MCP connects agents to external tools, **A2A connects agents to each other**, paving the way for cooperative, multi-agent systems that can work together to solve complex problems.
 28 | 
 29 | 📚 Dive deeper into A2A:  
 30 | - [Google’s A2A announcement](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/)
 31 | 


 --------------------------------------------------------------------------------
 /units/en/unit4/conclusion.mdx:
 --------------------------------------------------------------------------------
 1 | # Conclusion
 2 | 
 3 | **Congratulations on finishing the Agents Course!** 
 4 | 
 5 | Through perseverance and dedication, you’ve built a solid foundation in the world of AI Agents.
 6 | 
 7 | But finishing this course is **not the end of your journey**. It’s just the beginning: don’t hesitate to explore the next section where we share curated resources to help you continue learning, including advanced topics like **MCPs** and beyond.
 8 | 
 9 | **Thank you** for being part of this course. **We hope you liked this course as much as we loved writing it**.
 10 | 
 11 | And don’t forget: **Keep Learning, Stay Awesome 🤗**


 --------------------------------------------------------------------------------
 /units/en/unit4/get-your-certificate.mdx:
 --------------------------------------------------------------------------------
 1 | # Claim Your Certificate 🎓
 2 | 
 3 | If you scored **above 30%, congratulations! 👏 You're now eligible to claim your official certificate.**
 4 | 
 5 | Follow the steps below to receive it:
 6 | 
 7 | 1. Visit the [certificate page](https://huggingface.co/spaces/agents-course/Unit4-Final-Certificate).
 8 | 2. **Sign in** with your Hugging Face account using the button provided.
 9 | 3. **Enter your full name**. This is the name that will appear on your certificate.
 10 | 4. Click **“Get My Certificate”** to verify your score and download your certificate.
 11 | 
 12 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/congrats.png" alt="Congrats!" />
 13 | 
 14 | Once you’ve got your certificate, feel free to:
 15 | - Add it to your **LinkedIn profile** 🧑‍💼  
 16 | - Share it on **X**, **Bluesky**, etc. 🎉
 17 | 
 18 | **Don’t forget to tag [@huggingface](https://huggingface.co/huggingface). We’d be super proud and we’d love to cheer you on! 🤗**
 19 | 


 --------------------------------------------------------------------------------
 /units/en/unit4/hands-on.mdx:
 --------------------------------------------------------------------------------
 1 | # Hands-On
 2 | 
 3 | Now that you’re ready to dive deeper into the creation of your final agent, let’s see how you can submit it for review.
 4 | 
 5 | ## The Dataset 
 6 | 
 7 | The Dataset used in this leaderboard consist of 20 questions extracted from the level 1 questions of the **validation** set from GAIA. 
 8 | The chosen question were filtered based on the number of tools and steps needed to answer a question.
 9 | 
 10 | Based on the current look of the GAIA benchmark, we think that getting you to try to aim for 30% on level 1 question is a fair test.
 11 | 
 12 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/leaderboard%20GAIA%2024%3A04%3A2025.png" alt="GAIA current status!" />
 13 | 
 14 | ## The process 
 15 | 
 16 | Now the big question in your mind is probably : "How do I start submitting ?"
 17 | 
 18 | For this Unit, we created an API that will allow you to get the questions, and send your answers for scoring.
 19 | Here is a summary of the routes (see the [live documentation](https://agents-course-unit4-scoring.hf.space/docs) for interactive details):
 20 | 
 21 | * **`GET /questions`**: Retrieve the full list of filtered evaluation questions.
 22 | * **`GET /random-question`**: Fetch a single random question from the list.
 23 | * **`GET /files/{task_id}`**: Download a specific file associated with a given task ID.
 24 | * **`POST /submit`**: Submit agent answers, calculate the score, and update the leaderboard.
 25 | 
 26 | The submit function will compare the answer to the ground truth in an **EXACT MATCH** manner, hence prompt it well ! The GAIA team shared a prompting example for your agent [here](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
 27 | 
 28 | 🎨 **Make the Template Your Own!**
 29 | 
 30 | To demonstrate the process of interacting with the API, we've included a [basic template](https://huggingface.co/spaces/agents-course/Final_Assignment_Template) as a starting point.
 31 | Please feel free—and **actively encouraged**—to change, add to, or completely restructure it! Modify it in any way that best suits your approach and creativity.
 32 | 
 33 | In order to submit this templates compute 3 things needed by the API :
 34 | 
 35 | * **Username:** Your Hugging Face username (here obtained via Gradio login), which is used to identify your submission.
 36 | * **Code Link (`agent_code`):** the URL linking to your Hugging Face Space code (`.../tree/main`) for verification purposes, so please keep you space public.
 37 | * **Answers (`answers`):** The list of responses (`{"task_id": ..., "submitted_answer": ...}`) generated by your Agent for scoring.
 38 | 
 39 | Hence we encourage you to start by duplicating this [template](https://huggingface.co/spaces/agents-course/Final_Assignment_Template) on your own huggingface profile.
 40 | 
 41 | 🏆 Check out the leaderboard [here](https://huggingface.co/spaces/agents-course/Students_leaderboard)
 42 | 
 43 | *A friendly note: This leaderboard is meant for fun! We know it's possible to submit scores without full verification. If we see too many high scores posted without a public link to back them up, we might need to review, adjust, or remove some entries to keep the leaderboard useful.*
 44 | The leaderboard will show the link to your space code-base, since this leaderboard is for students only, please keep your space public if you get a score you're proud of.
 45 | <iframe
 46 | 	src="https://agents-course-students-leaderboard.hf.space"
 47 | 	frameborder="0"
 48 | 	width="850"
 49 | 	height="450"
 50 | ></iframe>


 --------------------------------------------------------------------------------
 /units/en/unit4/introduction.mdx:
 --------------------------------------------------------------------------------
 1 | # Welcome to the final Unit [[introduction]]
 2 | 
 3 | <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/thumbnail.jpg" alt="AI Agents Course thumbnail" width="100%"/>
 4 | 
 5 | Welcome to the final unit of the course! 🎉
 6 | 
 7 | So far, you’ve **built a strong foundation in AI Agents**, from understanding their components to creating your own. With this knowledge, you’re now ready to **build powerful agents** and stay up-to-date with the latest advancements in this fast-evolving field.
 8 | 
 9 | This unit is all about applying what you’ve learned. It’s your **final hands-on project**, and completing it is your ticket to earning the **course certificate**.
 10 | 
 11 | ## What’s the challenge?
 12 | 
 13 | You’ll create your own agent and **evaluate its performance using a subset of the [GAIA benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard)**.
 14 | 
 15 | To successfully complete the course, your agent needs to score **30% or higher** on the benchmark. Achieve that, and you’ll earn your **Certificate of Completion**, officially recognizing your expertise. 🏅
 16 | 
 17 | Additionally, see how you stack up against your peers! A dedicated **[Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard)** is available for you to submit your scores and see the community's progress.
 18 | 
 19 | > ** 🚨 Heads Up: Advanced & Hands-On Unit**
 20 | >
 21 | > Please be aware that this unit shifts towards a more practical, hands-on approach. Success in this section will require **more advanced coding knowledge** and relies on you navigating tasks with **less explicit guidance** compared to earlier parts of the course.
 22 | 
 23 | Sounds exciting? Let’s get started! 🚀


 --------------------------------------------------------------------------------
 /units/en/unit4/what-is-gaia.mdx:
 --------------------------------------------------------------------------------
 1 | # What is GAIA?
 2 | 
 3 | [GAIA](https://huggingface.co/papers/2311.12983) is a **benchmark designed to evaluate AI assistants on real-world tasks** that require a combination of core capabilities—such as reasoning, multimodal understanding, web browsing, and proficient tool use.
 4 | 
 5 | It was introduced in the paper _"[GAIA: A Benchmark for General AI Assistants](https://huggingface.co/papers/2311.12983)"_.
 6 | 
 7 | The benchmark features **466 carefully curated questions** that are **conceptually simple for humans**, yet **remarkably challenging for current AI systems**. 
 8 | 
 9 | To illustrate the gap:
 10 | - **Humans**: ~92% success rate  
 11 | - **GPT-4 with plugins**: ~15%  
 12 | - **Deep Research (OpenAI)**: 67.36% on the validation set
 13 | 
 14 | GAIA highlights the current limitations of AI models and provides a rigorous benchmark to evaluate progress toward truly general-purpose AI assistants.
 15 | 
 16 | ## 🌱 GAIA’s Core Principles
 17 | 
 18 | GAIA is carefully designed around the following pillars:
 19 | 
 20 | - 🔍 **Real-world difficulty**: Tasks require multi-step reasoning, multimodal understanding, and tool interaction.
 21 | - 🧾 **Human interpretability**: Despite their difficulty for AI, tasks remain conceptually simple and easy to follow for humans.
 22 | - 🛡️ **Non-gameability**: Correct answers demand full task execution, making brute-forcing ineffective.
 23 | - 🧰 **Simplicity of evaluation**: Answers are concise, factual, and unambiguous—ideal for benchmarking.
 24 | 
 25 | ## Difficulty Levels
 26 | 
 27 | GAIA tasks are organized into **three levels of increasing complexity**, each testing specific skills:
 28 | 
 29 | - **Level 1**: Requires less than 5 steps and minimal tool usage.
 30 | - **Level 2**: Involves more complex reasoning and coordination between multiple tools and 5-10 steps.
 31 | - **Level 3**: Demands long-term planning and advanced integration of various tools.
 32 | 
 33 | ![GAIA levels](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/gaia_levels.png)
 34 | 
 35 | ## Example of a Hard GAIA Question
 36 | 
 37 | > Which of the fruits shown in the 2008 painting "Embroidery from Uzbekistan" were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film "The Last Voyage"? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o'clock position. Use the plural form of each fruit.
 38 | 
 39 | As you can see, this question challenges AI systems in several ways:
 40 | 
 41 | - Requires a **structured response format**
 42 | - Involves **multimodal reasoning** (e.g., analyzing images)
 43 | - Demands **multi-hop retrieval** of interdependent facts:
 44 |   - Identifying the fruits in the painting
 45 |   - Discovering which ocean liner was used in *The Last Voyage*
 46 |   - Looking up the breakfast menu from October 1949 for that ship
 47 | - Needs **correct sequencing** and high-level planning to solve in the right order
 48 | 
 49 | This kind of task highlights where standalone LLMs often fall short, making GAIA an ideal benchmark for **agent-based systems** that can reason, retrieve, and execute over multiple steps and modalities.
 50 | 
 51 | ![GAIA capabilities plot](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/gaia_capabilities.png)
 52 | 
 53 | ## Live Evaluation
 54 | 
 55 | To encourage continuous benchmarking, **GAIA provides a public leaderboard hosted on Hugging Face**, where you can test your models against **300 testing questions**.
 56 | 
 57 | 👉 Check out the leaderboard [here](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
 58 | 
 59 | <iframe
 60 | 	src="https://gaia-benchmark-leaderboard.hf.space"
 61 | 	frameborder="0"
 62 | 	width="850"
 63 | 	height="450"
 64 | ></iframe>
 65 | 
 66 | Want to dive deeper into GAIA?
 67 | 
 68 | - 📄 [Read the full paper](https://huggingface.co/papers/2311.12983)
 69 | - 📄 [Deep Research release post by OpenAI](https://openai.com/index/introducing-deep-research/)
 70 | - 📄 [Open-source DeepResearch – Freeing our search agents](https://huggingface.co/blog/open-deep-research)


 --------------------------------------------------------------------------------
	└── units
	└── en
	└── unit4
	├── additional-readings.mdx
	├── conclusion.mdx
	├── get-your-certificate.mdx
	├── hands-on.mdx
	├── introduction.mdx
	└── what-is-gaia.mdx


	/units/en/unit4/additional-readings.mdx:
	--------------------------------------------------------------------------------
	1 \| # And now? What topics I should learn?
	2 \|
	3 \| Agentic AI is a rapidly evolving field, and understanding foundational protocols is essential for building intelligent, autonomous systems.
	4 \|
	5 \| Two important standards you should get familiar with are:
	6 \|
	7 \| - The Model Context Protocol (MCP)
	8 \| - The Agent-to-Agent Protocol (A2A)
	9 \|
	10 \| ## 🔌 Model Context Protocol (MCP)
	11 \|
	12 \| The Model Context Protocol (MCP) by Anthropic is an open standard that enables AI models to securely and seamlessly connect with external tools, data sources, and applications, making agents more capable and autonomous.
	13 \|
	14 \| Think of MCP as a universal adapter, like a USB-C port, that allows AI models to plug into various digital environments without needing custom integration for each one.
	15 \|
	16 \| MCP is quickly gaining traction across the industry, with major companies like OpenAI and Google beginning to adopt it.
	17 \|
	18 \| 📚 Learn more:
	19 \| - [Anthropic's official announcement and documentation](https://www.anthropic.com/news/model-context-protocol)
	20 \| - [MCP on Wikipedia](https://en.wikipedia.org/wiki/Model_Context_Protocol)
	21 \| - [Blog on MCP](https://huggingface.co/blog/Kseniase/mcp)
	22 \|
	23 \| ## 🤝 Agent-to-Agent (A2A) Protocol
	24 \|
	25 \| Google has developed the Agent-to-Agent (A2A) protocol as a complementary counterpart to Anthropic's Model Context Protocol (MCP).
	26 \|
	27 \| While MCP connects agents to external tools, A2A connects agents to each other, paving the way for cooperative, multi-agent systems that can work together to solve complex problems.
	28 \|
	29 \| 📚 Dive deeper into A2A:
	30 \| - [Google’s A2A announcement](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/)
	31 \|


	--------------------------------------------------------------------------------
	/units/en/unit4/conclusion.mdx:
	--------------------------------------------------------------------------------
	1 \| # Conclusion
	2 \|
	3 \| Congratulations on finishing the Agents Course!
	4 \|
	5 \| Through perseverance and dedication, you’ve built a solid foundation in the world of AI Agents.
	6 \|
	7 \| But finishing this course is not the end of your journey. It’s just the beginning: don’t hesitate to explore the next section where we share curated resources to help you continue learning, including advanced topics like MCPs and beyond.
	8 \|
	9 \| Thank you for being part of this course. We hope you liked this course as much as we loved writing it.
	10 \|
	11 \| And don’t forget: Keep Learning, Stay Awesome 🤗


	--------------------------------------------------------------------------------
	/units/en/unit4/get-your-certificate.mdx:
	--------------------------------------------------------------------------------
	1 \| # Claim Your Certificate 🎓
	2 \|
	3 \| If you scored above 30%, congratulations! 👏 You're now eligible to claim your official certificate.
	4 \|
	5 \| Follow the steps below to receive it:
	6 \|
	7 \| 1. Visit the [certificate page](https://huggingface.co/spaces/agents-course/Unit4-Final-Certificate).
	8 \| 2. Sign in with your Hugging Face account using the button provided.
	9 \| 3. Enter your full name. This is the name that will appear on your certificate.
	10 \| 4. Click “Get My Certificate” to verify your score and download your certificate.
	11 \|
	12 \| <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/congrats.png" alt="Congrats!" />
	13 \|
	14 \| Once you’ve got your certificate, feel free to:
	15 \| - Add it to your LinkedIn profile 🧑‍💼
	16 \| - Share it on X, Bluesky, etc. 🎉
	17 \|
	18 \| Don’t forget to tag [@huggingface](https://huggingface.co/huggingface). We’d be super proud and we’d love to cheer you on! 🤗
	19 \|


	--------------------------------------------------------------------------------
	/units/en/unit4/hands-on.mdx:
	--------------------------------------------------------------------------------
	1 \| # Hands-On
	2 \|
	3 \| Now that you’re ready to dive deeper into the creation of your final agent, let’s see how you can submit it for review.
	4 \|
	5 \| ## The Dataset
	6 \|
	7 \| The Dataset used in this leaderboard consist of 20 questions extracted from the level 1 questions of the validation set from GAIA.
	8 \| The chosen question were filtered based on the number of tools and steps needed to answer a question.
	9 \|
	10 \| Based on the current look of the GAIA benchmark, we think that getting you to try to aim for 30% on level 1 question is a fair test.
	11 \|
	12 \| <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/leaderboard%20GAIA%2024%3A04%3A2025.png" alt="GAIA current status!" />
	13 \|
	14 \| ## The process
	15 \|
	16 \| Now the big question in your mind is probably : "How do I start submitting ?"
	17 \|
	18 \| For this Unit, we created an API that will allow you to get the questions, and send your answers for scoring.
	19 \| Here is a summary of the routes (see the [live documentation](https://agents-course-unit4-scoring.hf.space/docs) for interactive details):
	20 \|
	21 \| * `GET /questions`: Retrieve the full list of filtered evaluation questions.
	22 \| * `GET /random-question`: Fetch a single random question from the list.
	23 \| * `GET /files/{task_id}`: Download a specific file associated with a given task ID.
	24 \| * `POST /submit`: Submit agent answers, calculate the score, and update the leaderboard.
	25 \|
	26 \| The submit function will compare the answer to the ground truth in an EXACT MATCH manner, hence prompt it well ! The GAIA team shared a prompting example for your agent [here](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
	27 \|
	28 \| 🎨 Make the Template Your Own!
	29 \|
	30 \| To demonstrate the process of interacting with the API, we've included a [basic template](https://huggingface.co/spaces/agents-course/Final_Assignment_Template) as a starting point.
	31 \| Please feel free—and actively encouraged—to change, add to, or completely restructure it! Modify it in any way that best suits your approach and creativity.
	32 \|
	33 \| In order to submit this templates compute 3 things needed by the API :
	34 \|
	35 \| * Username: Your Hugging Face username (here obtained via Gradio login), which is used to identify your submission.
	36 \| * Code Link (`agent_code`): the URL linking to your Hugging Face Space code (`.../tree/main`) for verification purposes, so please keep you space public.
	37 \| * Answers (`answers`): The list of responses (`{"task_id": ..., "submitted_answer": ...}`) generated by your Agent for scoring.
	38 \|
	39 \| Hence we encourage you to start by duplicating this [template](https://huggingface.co/spaces/agents-course/Final_Assignment_Template) on your own huggingface profile.
	40 \|
	41 \| 🏆 Check out the leaderboard [here](https://huggingface.co/spaces/agents-course/Students_leaderboard)
	42 \|
	43 \| A friendly note: This leaderboard is meant for fun! We know it's possible to submit scores without full verification. If we see too many high scores posted without a public link to back them up, we might need to review, adjust, or remove some entries to keep the leaderboard useful.
	44 \| The leaderboard will show the link to your space code-base, since this leaderboard is for students only, please keep your space public if you get a score you're proud of.
	45 \| <iframe
	46 \| src="https://agents-course-students-leaderboard.hf.space"
	47 \| frameborder="0"
	48 \| width="850"
	49 \| height="450"
	50 \| ></iframe>


	--------------------------------------------------------------------------------
	/units/en/unit4/introduction.mdx:
	--------------------------------------------------------------------------------
	1 \| # Welcome to the final Unit [[introduction]]
	2 \|
	3 \| <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/thumbnail.jpg" alt="AI Agents Course thumbnail" width="100%"/>
	4 \|
	5 \| Welcome to the final unit of the course! 🎉
	6 \|
	7 \| So far, you’ve built a strong foundation in AI Agents, from understanding their components to creating your own. With this knowledge, you’re now ready to build powerful agents and stay up-to-date with the latest advancements in this fast-evolving field.
	8 \|
	9 \| This unit is all about applying what you’ve learned. It’s your final hands-on project, and completing it is your ticket to earning the course certificate.
	10 \|
	11 \| ## What’s the challenge?
	12 \|
	13 \| You’ll create your own agent and evaluate its performance using a subset of the [GAIA benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard).
	14 \|
	15 \| To successfully complete the course, your agent needs to score 30% or higher on the benchmark. Achieve that, and you’ll earn your Certificate of Completion, officially recognizing your expertise. 🏅
	16 \|
	17 \| Additionally, see how you stack up against your peers! A dedicated [Student Leaderboard](https://huggingface.co/spaces/agents-course/Students_leaderboard) is available for you to submit your scores and see the community's progress.
	18 \|
	19 \| > 🚨 Heads Up: Advanced & Hands-On Unit
	20 \| >
	21 \| > Please be aware that this unit shifts towards a more practical, hands-on approach. Success in this section will require more advanced coding knowledge and relies on you navigating tasks with less explicit guidance compared to earlier parts of the course.
	22 \|
	23 \| Sounds exciting? Let’s get started! 🚀


	--------------------------------------------------------------------------------
	/units/en/unit4/what-is-gaia.mdx:
	--------------------------------------------------------------------------------
	1 \| # What is GAIA?
	2 \|
	3 \| [GAIA](https://huggingface.co/papers/2311.12983) is a benchmark designed to evaluate AI assistants on real-world tasks that require a combination of core capabilities—such as reasoning, multimodal understanding, web browsing, and proficient tool use.
	4 \|
	5 \| It was introduced in the paper _"[GAIA: A Benchmark for General AI Assistants](https://huggingface.co/papers/2311.12983)"_.
	6 \|
	7 \| The benchmark features 466 carefully curated questions that are conceptually simple for humans, yet remarkably challenging for current AI systems.
	8 \|
	9 \| To illustrate the gap:
	10 \| - Humans: ~92% success rate
	11 \| - GPT-4 with plugins: ~15%
	12 \| - Deep Research (OpenAI): 67.36% on the validation set
	13 \|
	14 \| GAIA highlights the current limitations of AI models and provides a rigorous benchmark to evaluate progress toward truly general-purpose AI assistants.
	15 \|
	16 \| ## 🌱 GAIA’s Core Principles
	17 \|
	18 \| GAIA is carefully designed around the following pillars:
	19 \|
	20 \| - 🔍 Real-world difficulty: Tasks require multi-step reasoning, multimodal understanding, and tool interaction.
	21 \| - 🧾 Human interpretability: Despite their difficulty for AI, tasks remain conceptually simple and easy to follow for humans.
	22 \| - 🛡️ Non-gameability: Correct answers demand full task execution, making brute-forcing ineffective.
	23 \| - 🧰 Simplicity of evaluation: Answers are concise, factual, and unambiguous—ideal for benchmarking.
	24 \|
	25 \| ## Difficulty Levels
	26 \|
	27 \| GAIA tasks are organized into three levels of increasing complexity, each testing specific skills:
	28 \|
	29 \| - Level 1: Requires less than 5 steps and minimal tool usage.
	30 \| - Level 2: Involves more complex reasoning and coordination between multiple tools and 5-10 steps.
	31 \| - Level 3: Demands long-term planning and advanced integration of various tools.
	32 \|
	33 \| ![GAIA levels](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/gaia_levels.png)
	34 \|
	35 \| ## Example of a Hard GAIA Question
	36 \|
	37 \| > Which of the fruits shown in the 2008 painting "Embroidery from Uzbekistan" were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film "The Last Voyage"? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o'clock position. Use the plural form of each fruit.
	38 \|
	39 \| As you can see, this question challenges AI systems in several ways:
	40 \|
	41 \| - Requires a structured response format
	42 \| - Involves multimodal reasoning (e.g., analyzing images)
	43 \| - Demands multi-hop retrieval of interdependent facts:
	44 \| - Identifying the fruits in the painting
	45 \| - Discovering which ocean liner was used in The Last Voyage
	46 \| - Looking up the breakfast menu from October 1949 for that ship
	47 \| - Needs correct sequencing and high-level planning to solve in the right order
	48 \|
	49 \| This kind of task highlights where standalone LLMs often fall short, making GAIA an ideal benchmark for agent-based systems that can reason, retrieve, and execute over multiple steps and modalities.
	50 \|
	51 \| ![GAIA capabilities plot](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit4/gaia_capabilities.png)
	52 \|
	53 \| ## Live Evaluation
	54 \|
	55 \| To encourage continuous benchmarking, GAIA provides a public leaderboard hosted on Hugging Face, where you can test your models against 300 testing questions.
	56 \|
	57 \| 👉 Check out the leaderboard [here](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
	58 \|
	59 \| <iframe
	60 \| src="https://gaia-benchmark-leaderboard.hf.space"
	61 \| frameborder="0"
	62 \| width="850"
	63 \| height="450"
	64 \| ></iframe>
	65 \|
	66 \| Want to dive deeper into GAIA?
	67 \|
	68 \| - 📄 [Read the full paper](https://huggingface.co/papers/2311.12983)
	69 \| - 📄 [Deep Research release post by OpenAI](https://openai.com/index/introducing-deep-research/)
	70 \| - 📄 [Open-source DeepResearch – Freeing our search agents](https://huggingface.co/blog/open-deep-research)


	--------------------------------------------------------------------------------