Created
July 31, 2025 18:53
-
-
Save ex9-fyi/202fda447a6e59bcf6d6466f5427ac79 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [130] (.mini-env) ~/w/mini-swe-agent (main*) ❯ python src/minisweagent/run/mini_manager.py | |
| 👋 This is mini-swe-agent version 1.3.0. | |
| Your config is stored in '/Users/j/Library/Application Support/mini-swe-agent/.env' | |
| What do you want to do? | |
| Let TDD Reviewer review README.md | |
| Got that, thanks! | |
| System: | |
| You are a helpful assistant that can interact with a computer. | |
| Your response must contain exactly ONE bash code block with ONE command (or commands connected with && or ||). | |
| Include a THOUGHT section before your command where you explain your reasoning process. | |
| Format your response as shown in <format_example>. | |
| <format_example> | |
| Your reasoning and analysis here. Explain why you want to perform the action. | |
| ```bash | |
| your_command_here | |
| ``` | |
| </format_example> | |
| Failure to follow these rules will cause your response to be rejected. | |
| To finish, issue the following command: `echo MINI_SWE_AGENT_FINAL_OUTPUT` | |
| without any other command. | |
| <manager_system_template> | |
| <subagent_registry> | |
| Available subagents: | |
| - tdd-reviewer: Ask TDD Reviewer to review when you are planning (concrete todos), changing code or both. TDD Reviewer helps to ensure your todos or code change | |
| follows proper TDD red-green-refactor-commit methodology. You can ask TDD Reviewer to either 1) review code or 2) review todos. | |
| To delegate a task to a specific subagent, use: | |
| echo "MINI_SWE_AGENT_SPAWN_CHILD::subagent-name | |
| Task description for the subagent" | |
| You MUST specify a subagent name. Generic spawning is not allowed. | |
| </subagent_registry> | |
| </manager_system_template> | |
| User: | |
| Please solve this issue: Let TDD Reviewer review README.md | |
| You can execute bash commands and edit files to implement the necessary changes. | |
| ## Recommended Workflow | |
| 1. If present, you might want to take a look at the following files that set additional guidelines | |
| for your work: CLAUDE.md, .github/copilot-instructions.md. Use a find command to locate all of them at once. | |
| If files corresponding to multiple AI systems are present, it's enough to read the ones for one of them. | |
| 2. Analyze the codebase by finding and reading relevant files. | |
| 3. Create a script to reproduce the issue | |
| 4. Edit the source code to resolve the issue | |
| 5. Verify your fix works by running your script again | |
| 6. Test edge cases to ensure your fix is robust | |
| ## Important Rules | |
| 1. Every response must contain exactly one action | |
| 2. The action must be enclosed in triple backticks | |
| 3. Directory or environment variable changes are not persistent. Every action is executed in a new subshell. | |
| However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...` or write/load environment variables from files | |
| 4. To finish, issue the following command: `echo MINI_SWE_AGENT_FINAL_OUTPUT`. | |
| Do not combine it with any other command. | |
| <system_information> | |
| Darwin 24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:43 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T8132 arm64 arm | |
| </system_information> | |
| ## Formatting your response | |
| Here is an example of a correct response: | |
| <example_response> | |
| THOUGHT: I need to understand the structure of the repository first. Let me check what files are in the current directory to get a better understanding of the codebase. | |
| ```bash | |
| ls -la | |
| ``` | |
| </example_response> | |
| ## Useful command examples | |
| ### Create a new file: | |
| ```bash | |
| cat <<'EOF' > newfile.py | |
| import numpy as np | |
| hello = "world" | |
| print(hello) | |
| EOF | |
| ``` | |
| ### Edit files with sed:<important> | |
| You are on MacOS. For all the below examples, you need to use `sed -i ''` instead of `sed -i`. | |
| </important>```bash | |
| # Replace all occurrences | |
| sed -i 's/old_string/new_string/g' filename.py | |
| # Replace only first occurrence | |
| sed -i 's/old_string/new_string/' filename.py | |
| # Replace first occurrence on line 1 | |
| sed -i '1s/old_string/new_string/' filename.py | |
| # Replace all occurrences in lines 1-10 | |
| sed -i '1,10s/old_string/new_string/g' filename.py | |
| ``` | |
| ### View file content: | |
| ```bash | |
| # View specific lines with numbers | |
| nl -ba filename.py | sed -n '10,20p' | |
| ``` | |
| ### Any other command you want to run | |
| ```bash | |
| anything | |
| ``` | |
| ### Delegate to a subagent: | |
| ```bash | |
| # Example: Ask TDD reviewer to review your code changes | |
| echo "MINI_SWE_AGENT_SPAWN_CHILD::tdd-reviewer | |
| Please review the changes I made to the authentication module | |
| and ensure they follow TDD principles" | |
| ``` | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT (step 1, $0.00): | |
| THOUGHT: I need to understand the structure of the repository first and locate the README.md file. Let me start by exploring the current directory to see what files are | |
| available. | |
| ```bash | |
| ls -la | |
| ``` | |
| Execute? Enter to confirm, or Type a comment/command (/h for available commands) | |
| > /y | |
| Switched to yolo mode (applies to all agents). | |
| User: | |
| <returncode>0</returncode> | |
| <output> | |
| total 80 | |
| drwxr-xr-x@ 19 j staff 608 Jul 30 15:43 . | |
| drwxr-xr-x@ 31 j staff 992 Jul 25 03:21 .. | |
| drwxr-xr-x@ 5 j staff 160 Jul 30 15:44 .claude | |
| drwxr-xr-x@ 3 j staff 96 Jul 25 03:21 .cursor | |
| drwxr-xr-x@ 15 j staff 480 Jul 31 14:48 .git | |
| drwxr-xr-x@ 10 j staff 320 Jul 28 23:13 .github | |
| -rw-r--r--@ 1 j staff 3569 Jul 30 16:00 .gitignore | |
| lrwxr-xr-x@ 1 j staff 8 Jul 26 20:47 .mini -> .claude/ | |
| drwxr-xr-x@ 7 j staff 224 Jul 26 23:45 .mini-env | |
| -rw-r--r--@ 1 j staff 956 Jul 30 15:43 .pre-commit-config.yaml | |
| drwxr-xr-x@ 6 j staff 192 Jul 25 22:42 .pytest_cache | |
| drwxr-xr-x@ 3 j staff 96 Jul 25 03:21 .vscode | |
| drwxr-xr-x@ 15 j staff 480 Jul 31 14:03 docs | |
| -rw-r--r--@ 1 j staff 1094 Jul 25 03:21 LICENSE.md | |
| -rw-r--r--@ 1 j staff 4453 Jul 28 23:13 mkdocs.yml | |
| -rw-r--r--@ 1 j staff 6668 Jul 26 23:38 pyproject.toml | |
| -rw-r--r--@ 1 j staff 10394 Jul 30 15:43 README.md | |
| drwxr-xr-x@ 4 j staff 128 Jul 25 11:32 src | |
| drwxr-xr-x@ 12 j staff 384 Jul 26 23:39 tests | |
| </output> | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT (step 2, $0.01): | |
| THOUGHT: Great! I can see there's a README.md file in the current directory. Now I should check for any AI system guidelines as recommended in the workflow. Let me look | |
| for CLAUDE.md and .github/copilot-instructions.md files. | |
| ```bash | |
| find . -name "CLAUDE.md" -o -name "copilot-instructions.md" -o -name ".github/copilot-instructions.md" | |
| ``` | |
| User: | |
| <returncode>0</returncode> | |
| <output> | |
| ./.github/copilot-instructions.md | |
| </output> | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT (step 3, $0.01): | |
| THOUGHT: I found the GitHub Copilot instructions file. Let me read it to understand any additional guidelines for my work. | |
| ```bash | |
| cat ./.github/copilot-instructions.md | |
| ``` | |
| User: | |
| <returncode>0</returncode> | |
| <output> | |
| # Style guide | |
| 1. Target python 3.10 or higher | |
| 2. Use python with type annotations. Use `list` instead of `List`. | |
| 3. Use `pathlib` instead of `os.path`. Use `Path.read_text()` over `with ...open()` constructs. | |
| 4. Use `typer` to add interfaces | |
| 5. Keep code comments to a minimum and only highlight particularly logically challenging things | |
| 6. Do not append to the README unless specifically requested | |
| 7. Use `jinja` for formatting templates | |
| 8. Use `dataclass` for keeping track config | |
| 9. Do not catch exceptions unless explicitly told to. | |
| 10. Write concise, short, minimal code. | |
| 11. In most cases, avoid initializing variables just to pass them to a function. Instead just pass the expression to the function directly. | |
| 12. Not every exception has to be caught. Exceptions are a good way to show problems to a user. | |
| 13. This repository rewards minimal code. Try to be as concise as possible. | |
| Here's an example for rule 11: | |
| ```python | |
| # bad | |
| a = func() | |
| Class(a) | |
| # good | |
| Class(func()) | |
| ``` | |
| ## Test style | |
| 1. Use `pytest`, not `unittest`. | |
| 2. <IMPORTANT>Do not mock/patch anything that you're not explicitly asked to do</IMPORTANT> | |
| 3. Avoid writing trivial tests. Every test should test for at least one, preferably multiple points of failure | |
| 4. Avoid splitting up code in multiple lines like this: `a=func()\n assert a=b`. Instead, just do `assert func() == b` | |
| 5. The first argument to `pytest.mark.parametrize` should be a tuple (not a string! not a list!), the second argument must be a list (not a tuple!). | |
| Here's an example for rule 4: | |
| ```python | |
| # bad | |
| result = func() | |
| assert result == b | |
| # good | |
| assert func() == b | |
| ``` | |
| </output> | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT (step 4, $0.01): | |
| THOUGHT: Now I understand the guidelines. Let me read the README.md file to see what content needs to be reviewed by the TDD Reviewer. | |
| ```bash | |
| cat README.md | |
| ``` | |
| User: | |
| <returncode>0</returncode> | |
| <warning> | |
| The output of your last command was too long. | |
| Please try a different command that produces less output. | |
| If you're looking at a file you can try use head, tail or sed to view a smaller number of lines selectively. | |
| If you're using grep or find and it produced too much output, you can use a more selective search pattern. | |
| If you really need to see something from the full command's output, you can redirect output to a file and then search in that file. | |
| </warning><output_head> | |
| <div align="center"> | |
| <a href="https://mini-swe-agent.com/latest/"><img src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/mini-swe-agent-banner.svg" alt="mini-swe-agent | |
| banner" style="height: 7em"/></a> | |
| </div> | |
| <h1>The 100 line AI agent that solves GitHub issues & more</h1> | |
| [](https://mini-swe-agent.com/latest/) | |
| [](https://join.slack.com/t/swe-bench/shared_invite/zt-36pj9bu5s-o3_yXPZ | |
| baH2wVnxnss1EkQ) | |
| [](https://pypi.org/project/mini-swe- | |
| agent/) | |
| In 2024, [SWE-bench](https://github.com/swe-bench/SWE-bench) & [SWE-agent](https://github.com/swe-agent/swe-agent) helped kickstart the coding agent revolution. | |
| We now ask: **What if SWE-agent was 100x smaller, and still worked nearly as well?** | |
| `mini` is for | |
| - **Researchers** who want to **benchmark, fine-tune or RL** without assumptions, bloat, or surprises | |
| - **Developers** who like their tools like their scripts: **short, sharp, and readable** | |
| - **Engineers** who want something **trivial to sandbox & to deploy anywhere** | |
| Here's some details: | |
| - **Minimal**: Just [100 lines of python](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/agents/default.py) (+100 total for | |
| [env](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/environments/local.py), | |
| [model](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/models/litellm_model.py), | |
| [script](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/run/hello_world.py)) — no fancy dependencies! | |
| - **Powerful:** Resolves 65% of GitHub issues in the [SWE-bench verified benchmark](https://www.swebench.com/) (with Claude Sonnet 4). | |
| - **Convenient:** Comes with UIs that turn this into your daily dev swiss army knife! | |
| - **Deployable:** In addition to local envs, you can use **docker**, **podman**, **singularity**, **apptainer**, and more | |
| - **Tested:** [](https://codecov.io/gh/SWE-agent/mini-swe-agent) | |
| - **Cutting edge:** Built by the Princeton & Stanford team behind [SWE-bench](https://swebench.com) and [SWE-agent](https://swe-agent.com). | |
| <details> | |
| <summary>More motivation (for research)</summary> | |
| [SWE-agent](https://swe-agent.com/latest/) jump-started the development of AI agents in 2024. Back then, we placed a lot of emphasis on tools and special interfaces for | |
| the agent. | |
| However, one year later, as LMs have become more capable, a lot of this is not needed at all to build a useful agent! | |
| In fact, mini-SWE-agent | |
| - **Does not have any tools other than bash** — it doesn't even use the tool-calling interface of the LMs. | |
| This means that you can run it with literally any model. When running in sandboxed environments you also don't need to take care | |
| of installing a single package — all it needs is bash. | |
| - **Has a completely linear history** — every step of the agent just appends to the messages and that's it. | |
| So there's no difference between the trajectory and the messages that you pass on to the LM. | |
| Great for debugging & fine-tuning. | |
| - **Executes actions with `subprocess.run`** — every action is completely independent (as opposed to keeping a stateful shell session running). | |
| This makes it trivial to execute the actions in sandboxes (literally just switch out `subprocess.run` with `docker exec`) and to | |
| scale up effortlessly. Seriously, this is [a big deal](https://mini-swe-agent.com/latest/faq/#why-no-shell-session), trust me. | |
| This makes it perfect as a baseline system and for a system that puts the language model (rather than | |
| the agent scaffold) in the middle of our attention. | |
| </details> | |
| <details> | |
| <summary>More motivation (as a tool)</summary> | |
| Some agents are overfitted research artifacts. Others are UI-heavy frontend monsters. | |
| `mini` wants to be a hackable tool, not a black box. | |
| - **Simple** enough to understand at a glance | |
| - **Convenient** enough to use in daily workflows | |
| - **Flexible** to extend | |
| Unlike other agents (including our own [swe-agent](https://swe-agent.com/latest/)), it is radically simpler, because it: | |
| - **Does not have any tools other than bash** — it doesn't even use the tool-calling interface of the LMs. | |
| Instead of implementing custom tools for every specific thing the agent might want to do, the focus is fully on the LM utilizing the shell to its full potential. | |
| Want it to do something specific like opening a PR? | |
| Just tell the LM to figure it out rather than spending time to implement it in the agent. | |
| - **Executes actions with `subprocess.run`** — every action is completely independent (as opposed to keeping a stateful shell session running). | |
| This i | |
| </output_head> | |
| <elided_chars> | |
| 378 characters elided | |
| </elided_chars> | |
| <output_tail> | |
| >Should I use SWE-agent or mini-SWE-agent?</summary> | |
| You should use `mini-swe-agent` if | |
| - You want a quick command line tool that works locally | |
| - You want an agent with a very simple control flow | |
| - You want even faster, simpler & more stable sandboxing & benchmark evaluations | |
| - You are doing FT or RL and don't want to overfit to a specific agent scaffold | |
| You should use `swe-agent` if | |
| - You need specific tools or want to experiment with different tools | |
| - You want to experiment with different history processors | |
| - You want very powerful yaml configuration without touching code | |
| What you get with both | |
| - Excellent performance on SWE-Bench | |
| - A trajectory browser | |
| </details> | |
| <table> | |
| <tr> | |
| <td width="50%"> | |
| <a href="https://mini-swe-agent.com/latest/usage/mini/"><strong>Simple UI</strong></a> (<code>mini</code>) | |
| </td> | |
| <td> | |
| <a href="https://mini-swe-agent.com/latest/usage/mini_v/"><strong>Visual UI</strong></a> (<code>mini -v</code>) | |
| </td> | |
| </tr> | |
| <tr> | |
| <td width="50%"> | |
|  | |
| </td> | |
| <td> | |
|  | |
| </td> | |
| </tr> | |
| <tr> | |
| <td> | |
| <a href="https://mini-swe-agent.com/latest/usage/swebench/"><strong>Batch inference</strong></a> | |
| </td> | |
| <td> | |
| <a href="https://mini-swe-agent.com/latest/usage/inspector/"><strong>Trajectory browser</strong></a> | |
| </td> | |
| <tr> | |
| <tr> | |
| <td> | |
|  | |
| </td> | |
| <td> | |
|  | |
| </td> | |
| </tr> | |
| <td> | |
| <a href="https://mini-swe-agent.com/latest/advanced/cookbook/"><strong>Python bindings</strong></a> | |
| </td> | |
| <td> | |
| <a href="https://mini-swe-agent.com"><strong>More in the docs</strong></a> | |
| </td> | |
| </tr> | |
| <tr> | |
| <td> | |
| ```python | |
| agent = DefaultAgent( | |
| LitellmModel(model_name=...), | |
| LocalEnvironment(), | |
| ) | |
| agent.run("Write a sudoku game") | |
| ``` | |
| </td> | |
| <td> | |
| * [Quick start](https://mini-swe-agent.com/latest/quickstart/) | |
| * [`mini`](https://mini-swe-agent.com/latest/usage/mini/) | |
| * [FAQ](https://mini-swe-agent.com/latest/faq/) | |
| * [Configuration](https://mini-swe-agent.com/latest/advanced/configuration/) | |
| * [Power up](https://mini-swe-agent.com/latest/advanced/cookbook/) | |
| </td> | |
| </tr> | |
| </table> | |
| ## Let's get started! | |
| Option 1: Install + run in virtual environment | |
| ```bash | |
| pip install uv && uvx mini-swe-agent [-v] | |
| # or | |
| pip install pipx && pipx ensurepath && pipx run mini-swe-agent [-v] | |
| ``` | |
| Option 2: Install in current environment | |
| ```bash | |
| pip install mini-swe-agent && mini [-v] | |
| ``` | |
| Option 3: Install from source | |
| ```bash | |
| git clone https://github.com/SWE-agent/mini-swe-agent.git | |
| cd mini-swe-agent | |
| pip install -e . | |
| mini [-v] | |
| ``` | |
| Read more in our [documentation](https://mini-swe-agent.com/latest/): | |
| * [Quick start guide](https://mini-swe-agent.com/latest/quickstart/) | |
| * More on [`mini`](https://mini-swe-agent.com/latest/usage/mini/) and [`mini -v`](https://mini-swe-agent.com/latest/usage/mini_v/) | |
| * [Configuration](https://mini-swe-agent.com/latest/advanced/configuration/) | |
| * [Power up with the cookbook](https://mini-swe-agent.com/latest/advanced/cookbook/) | |
| * [FAQ](https://mini-swe-agent.com/latest/faq/) | |
| * [Contribute!](https://mini-swe-agent.com/latest/contributing/) | |
| ## Bottom line | |
| If you found this work helpful, please consider citing the [SWE-agent paper](https://arxiv.org/abs/2405.15793) in your work: | |
| ```bibtex | |
| @inproceedings{yang2024sweagent, | |
| title={{SWE}-agent: Agent-Computer Interfaces Enable Automated Software Engineering}, | |
| author={John Yang and Carlos E Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik R Narasimhan and Ofir Press}, | |
| booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}, | |
| year={2024}, | |
| url={https://arxiv.org/abs/2405.15793} | |
| } | |
| ``` | |
| More agentic AI: | |
| <div align="center"> | |
| <a href="https://github.com/SWE-agent/SWE-agent"><img src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/sweagent_logo_text_below.svg" | |
| alt="SWE-agent" height="120px"></a> | |
| | |
| <a href="https://github.com/SWE-agent/SWE-ReX"><img src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/swerex_logo_text_below.svg" alt="SWE-ReX" | |
| height="120px"></a> | |
| | |
| <a href="https://github.com/SWE-bench/SWE-bench"><img src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/swebench_logo_text_below.svg" | |
| alt="SWE-bench" height="120px"></a> | |
| | |
| <a href="https://github.com/SWE-bench/SWE-smith"><img src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/swesmith_logo_text_below.svg" | |
| alt="SWE-smith" height="120px"></a> | |
| | |
| <a href="https://github.com/SWE-bench/sb-cli"><img src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/sbcli_logo_text_below.svg" alt="sb-cli" | |
| height="120px"></a> | |
| </div> | |
| </output_tail> | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT (step 5, $0.03): | |
| THOUGHT: Now I have a good understanding of the README.md content. The issue asks me to let the TDD Reviewer review the README.md file. Since this is a documentation file | |
| rather than code, I should delegate this task to the TDD Reviewer subagent to get their perspective on whether the README follows proper practices and if there are any | |
| improvements needed from a TDD methodology standpoint. | |
| ```bash | |
| echo "MINI_SWE_AGENT_SPAWN_CHILD::tdd-reviewer | |
| Please review the README.md file in this repository. The README describes mini-swe-agent, a 100-line AI agent that solves GitHub issues. | |
| Please evaluate: | |
| 1. Whether the README properly describes the project's testing approach and methodology | |
| 2. If the documentation aligns with TDD principles and best practices | |
| 3. Any suggestions for improving the README from a TDD perspective | |
| 4. Whether the project structure and documentation support proper test-driven development workflows | |
| The README is quite comprehensive and includes information about the project's goals, features, installation instructions, and comparisons with other tools. Please | |
| provide feedback on how well it supports TDD practices and any improvements you'd recommend." | |
| ``` | |
| ━━━ Spawning tdd-reviewer as ROOT::1-tdd-reviewer ━━━ | |
| Task: Please review the README.md file in this repository. The README describes mini-swe-agent, a 100-line AI agent that solves GitHub issues. | |
| Please evaluate: | |
| 1. Whether the README properly describes the project's testing approach and methodology | |
| 2. If the documentation aligns with TDD principles and best practices | |
| 3. Any suggestions for improving the README from a TDD perspective | |
| 4. Whether the project structure and documentation support proper test-driven development workflows | |
| The README is quite comprehensive and includes information about the project's goals, features, installation instructions, and comparisons with other tools. Please | |
| provide feedback on how well it supports TDD practices and any improvements you'd recommend. | |
| System: | |
| You are a helpful assistant that can interact with a computer. | |
| Your response must contain exactly ONE bash code block with ONE command (or commands connected with && or ||). | |
| Include a THOUGHT section before your command where you explain your reasoning process. | |
| Format your response as shown in <format_example>. | |
| <format_example> | |
| Your reasoning and analysis here. Explain why you want to perform the action. | |
| ```bash | |
| your_command_here | |
| ``` | |
| </format_example> | |
| Failure to follow these rules will cause your response to be rejected. | |
| To finish, issue the following command: `echo MINI_SWE_AGENT_FINAL_OUTPUT` | |
| without any other command. | |
| <manager_system_template> | |
| <subagent_registry> | |
| Available subagents: | |
| - tdd-reviewer: Ask TDD Reviewer to review when you are planning (concrete todos), changing code or both. TDD Reviewer helps to ensure your todos or code change | |
| follows proper TDD red-green-refactor-commit methodology. You can ask TDD Reviewer to either 1) review code or 2) review todos. | |
| To delegate a task to a specific subagent, use: | |
| echo "MINI_SWE_AGENT_SPAWN_CHILD::subagent-name | |
| Task description for the subagent" | |
| You MUST specify a subagent name. Generic spawning is not allowed. | |
| </subagent_registry> | |
| </manager_system_template> | |
| You are a TDD reviewer agent. You are an expert in reviewing if a code change or implementation strictly follows TDD's red-green-refactor-commit developement style. | |
| # Task: review code | |
| You are reviewing code changes. You must determine if this code change violates TDD principles. | |
| **IMPORTANT**: First identify if this is a test file or implementation file by checking the file path for \`.test.\`, \`.spec.\`, or \`test/\`. | |
| # Task: review plan | |
| You are reviewing a plan which is an ordered list of concrete todos. You must determine if the plan violates TDD principles. The plan should contain a list of features. A | |
| plan has some number of feature to implement. A feature has some number of RED-GREEN-REFACTOR-COMMIT cycles. | |
| <plan-format> | |
| <feature-1> | |
| Title: Clear, action-oriented summary that immediately conveys what needs to be done. Use format like "Add user authentication to checkout flow" rather than | |
| "Authentication problem." | |
| Description: Brief context explaining why this work matters and what problem it solves. Include the user impact or business value. | |
| Acceptance: Specific, testable conditions that define "done." Write these as bulleted scenarios or given/when/then statements. This prevents scope creep and alignment | |
| issues later. | |
| Technical Details: Implementation notes, API endpoints, database changes, or architectural considerations. Include relevant code snippets, wireframes, or technical | |
| constraints. | |
| Dependencies: Other issues, external services, or team coordination required before starting work. | |
| Implementation roadmap: | |
| * RED: ... | |
| * GREEN: ... | |
| * REFACTOR: ... | |
| * COMMIT: ... | |
| * RED: ... | |
| * GREEN: ... | |
| * REFACTOR: ... | |
| * COMMIT: ... | |
| </feature-1> | |
| <feature-2> | |
| Title: ... | |
| Description: ... | |
| Acceptance: ... | |
| Technical Details: ... | |
| Dependencies: ... | |
| Implementation roadmap: ... | |
| * RED: ... | |
| * GREEN: ... | |
| * REFACTOR: ... | |
| * COMMIT: ... | |
| </feature-2> | |
| </plan-format> | |
| If the plan violates TDD, provide accurate and comprehensive information on how the plan violates TDD and how to fix the violations. | |
| # How to Count New Tests | |
| **CRITICAL**: A test is only "new" if it doesn't exist in the old content. | |
| 1. **Compare old content vs new content character by character** | |
| - Find test declarations: \`test(\`, \`it(\`, \`describe(\` | |
| - A test that exists in both old and new is NOT new | |
| - Only count tests that appear in new but not in old | |
| - Count the NUMBER of new tests added, not the total tests in the file | |
| 2. **What counts as a new test:** | |
| - A test block that wasn't in the old content | |
| - NOT: Moving an existing test to a different location | |
| - NOT: Renaming an existing test | |
| - NOT: Reformatting or refactoring existing tests | |
| 3. **Multiple test check:** | |
| - One new test = Allowed (part of TDD cycle) | |
| - Two or more new tests = Violation | |
| **Example**: If old content has 1 test and new content has 2 tests, that's adding 1 new test (allowed), NOT 2 tests total. | |
| # Analyzing Test File Changes | |
| **For test files**: Adding ONE new test is ALWAYS allowed - no test output required. This is the foundation of TDD. | |
| # Analyzing Implementation File Changes | |
| **For implementation files**: | |
| 1. **Check the test output** to understand the current failure | |
| 2. **Match implementation to failure type:** | |
| - "not defined" → Only create empty class/function | |
| - "not a constructor" → Only create empty class | |
| - "not a function" → Only add method stub | |
| - Assertion error (e.g., "expected 0 to be 4") → Implement minimal logic to make it pass | |
| 3. **Verify minimal implementation:** | |
| - Don't add extra methods | |
| - Don't add error handling unless tested | |
| - Don't implement features beyond current test | |
| # Example Analysis | |
| **Scenario**: Test fails with "Calculator is not defined" | |
| - Allowed: Add \`export class Calculator {}\` | |
| - Violation: Add \`export class Calculator { add(a, b) { return a + b; } }\` | |
| - **Reason**: Should only fix "not defined", not implement methods` | |
| User: | |
| Your task: Please review the README.md file in this repository. The README describes mini-swe-agent, a 100-line AI agent that solves GitHub issues. | |
| Please evaluate: | |
| 1. Whether the README properly describes the project's testing approach and methodology | |
| 2. If the documentation aligns with TDD principles and best practices | |
| 3. Any suggestions for improving the README from a TDD perspective | |
| 4. Whether the project structure and documentation support proper test-driven development workflows | |
| The README is quite comprehensive and includes information about the project's goals, features, installation instructions, and comparisons with other tools. Please | |
| provide feedback on how well it supports TDD practices and any improvements you'd recommend.. Please reply with a single shell command in triple backticks. To finish, the | |
| first line of the output of the shell command must be 'MINI_SWE_AGENT_FINAL_OUTPUT'. | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT::1-tdd-reviewer (step 6, $0.04): | |
| I need to examine the README.md file to evaluate how well it describes the project's testing approach and whether it aligns with TDD principles. | |
| ```bash | |
| cat README.md | |
| ``` | |
| User: | |
| Observation: {'output': '<div align="center">\n<a href="https://mini-swe-agent.com/latest/"><img | |
| src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/mini-swe-agent-banner.svg" alt="mini-swe-agent banner" style="height: 7em"/></a>\n</div>\n\n<h1>The | |
| 100 line AI agent that solves GitHub issues & | |
| more</h1>\n\n[](https://mini-swe-agent.com/latest/)\n[](https://join.slack.com/t/swe-bench/shared_invite/zt-36pj9bu5s-o3_yXPZbaH2wVnxns | |
| s1EkQ)\n[](https://pypi.org/project/mini-swe- | |
| agent/)\n\nIn 2024, [SWE-bench](https://github.com/swe-bench/SWE-bench) & [SWE-agent](https://github.com/swe-agent/swe-agent) helped kickstart the coding agent | |
| revolution.\n\nWe now ask: **What if SWE-agent was 100x smaller, and still worked nearly as well?**\n\n`mini` is for\n\n- **Researchers** who want to **benchmark, | |
| fine-tune or RL** without assumptions, bloat, or surprises\n- **Developers** who like their tools like their scripts: **short, sharp, and readable**\n- **Engineers** who | |
| want something **trivial to sandbox & to deploy anywhere**\n\nHere\'s some details:\n\n- **Minimal**: Just [100 lines of | |
| python](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/agents/default.py) (+100 total for | |
| [env](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/environments/local.py),\n[model](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/mi | |
| nisweagent/models/litellm_model.py), [script](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/run/hello_world.py)) — no fancy dependencies!\n- | |
| **Powerful:** Resolves 65% of GitHub issues in the [SWE-bench verified benchmark](https://www.swebench.com/) (with Claude Sonnet 4).\n- **Convenient:** Comes with UIs | |
| that turn this into your daily dev swiss army knife!\n- **Deployable:** In addition to local envs, you can use **docker**, **podman**, **singularity**, **apptainer**, and | |
| more\n- **Tested:** [](https://codecov.io/gh/SWE-agent/mini-swe-agent)\n- | |
| **Cutting edge:** Built by the Princeton & Stanford team behind [SWE-bench](https://swebench.com) and [SWE-agent](https://swe-agent.com).\n\n<details>\n\n<summary>More | |
| motivation (for research)</summary>\n\n[SWE-agent](https://swe-agent.com/latest/) jump-started the development of AI agents in 2024. Back then, we placed a lot of | |
| emphasis on tools and special interfaces for the agent.\nHowever, one year later, as LMs have become more capable, a lot of this is not needed at all to build a useful | |
| agent!\nIn fact, mini-SWE-agent\n\n- **Does not have any tools other than bash** — it doesn\'t even use the tool-calling interface of the LMs.\n This means that you can | |
| run it with literally any model. When running in sandboxed environments you also don\'t need to take care\n of installing a single package — all it needs is bash.\n- | |
| **Has a completely linear history** — every step of the agent just appends to the messages and that\'s it.\n So there\'s no difference between the trajectory and the | |
| messages that you pass on to the LM.\n Great for debugging & fine-tuning.\n- **Executes actions with `subprocess.run`** — every action is completely independent (as | |
| opposed to keeping a stateful shell session running).\n This makes it trivial to execute the actions in sandboxes (literally just switch out `subprocess.run` with | |
| `docker exec`) and to\n scale up effortlessly. Seriously, this is [a big deal](https://mini-swe-agent.com/latest/faq/#why-no-shell-session), trust me.\n\nThis makes it | |
| perfect as a baseline system and for a system that puts the language model (rather than\nthe agent scaffold) in the middle of our | |
| attention.\n\n</details>\n\n<details>\n<summary>More motivation (as a tool)</summary>\n\nSome agents are overfitted research artifacts. Others are UI-heavy frontend | |
| monsters.\n\n`mini` wants to be a hackable tool, not a black box.\n\n- **Simple** enough to understand at a glance\n- **Convenient** enough to use in daily workflows\n- | |
| **Flexible** to extend\n\nUnlike other agents (including our own [swe-agent](https://swe-agent.com/latest/)), it is radically simpler, because it:\n\n- **Does not have | |
| any tools other than bash** — it doesn\'t even use the tool-calling interface of the LMs.\n Instead of implementing custom tools for every specific thing the agent might | |
| want to do, the focus is fully on the LM utilizing the shell to its full potential.\n Want it to do something specific like opening a PR?\n Just tell the LM to figure | |
| it out rather than spending time to implement it in the agent.\n- **Executes actions with `subprocess.run`** — every action is completely independent (as opposed to | |
| keeping a stateful shell session running).\n This is [a big deal](https://mini-swe-agent.com/latest/faq/#why-no-shell-session) for the stability of the agent, trust | |
| me.\n- **Has a completely linear history** — every step of the agent just appends to the messages that are passed to the LM in the next step and that\'s it.\n This is | |
| great for debugging and understanding what the LM is prompted with.\n\n</details>\n\n<details>\n<summary>Should I use SWE-agent or mini-SWE-agent?</summary>\n\nYou should | |
| use `mini-swe-agent` if\n\n- You want a quick command line tool that works locally\n- You want an agent with a very simple control flow\n- You want even faster, simpler & | |
| more stable sandboxing & benchmark evaluations\n- You are doing FT or RL and don\'t want to overfit to a specific agent scaffold\n\nYou should use `swe-agent` if\n\n- You | |
| need specific tools or want to experiment with different tools\n- You want to experiment with different history processors\n- You want very powerful yaml configuration | |
| without touching code\n\nWhat you get with both\n\n- Excellent performance on SWE-Bench\n- A trajectory browser\n\n</details>\n\n<table>\n<tr>\n<td width="50%">\n<a | |
| href="https://mini-swe-agent.com/latest/usage/mini/"><strong>Simple UI</strong></a> (<code>mini</code>)\n</td>\n<td>\n<a | |
| href="https://mini-swe-agent.com/latest/usage/mini_v/"><strong>Visual UI</strong></a> (<code>mini -v</code>)\n</td>\n</tr>\n<tr>\n<td width="50%">\n\n | |
| \n\n</td>\n<td>\n\n | |
| \n\n</td>\n</tr>\n<tr>\n <td>\n <a | |
| href="https://mini-swe-agent.com/latest/usage/swebench/"><strong>Batch inference</strong></a>\n </td>\n <td>\n <a | |
| href="https://mini-swe-agent.com/latest/usage/inspector/"><strong>Trajectory browser</strong></a>\n | |
| </td>\n<tr>\n<tr>\n\n<td>\n\n\n\n</td>\n\n<td>\n\n\n\n</td>\n\n</tr>\n<td>\n<a | |
| href="https://mini-swe-agent.com/latest/advanced/cookbook/"><strong>Python bindings</strong></a>\n</td>\n<td>\n<a href="https://mini-swe-agent.com"><strong>More in the | |
| docs</strong></a>\n</td>\n</tr>\n<tr>\n<td>\n\n```python\nagent = DefaultAgent(\n LitellmModel(model_name=...),\n LocalEnvironment(),\n)\nagent.run("Write a sudoku | |
| game")\n```\n</td>\n<td>\n\n* [Quick start](https://mini-swe-agent.com/latest/quickstart/)\n* [`mini`](https://mini-swe-agent.com/latest/usage/mini/)\n* | |
| [FAQ](https://mini-swe-agent.com/latest/faq/)\n* [Configuration](https://mini-swe-agent.com/latest/advanced/configuration/)\n* [Power | |
| up](https://mini-swe-agent.com/latest/advanced/cookbook/)\n\n</td>\n</tr>\n</table>\n\n## Let\'s get started!\n\nOption 1: Install + run in virtual | |
| environment\n\n```bash\npip install uv && uvx mini-swe-agent [-v]\n# or\npip install pipx && pipx ensurepath && pipx run mini-swe-agent [-v]\n```\n\nOption 2: Install in | |
| current environment\n\n```bash\npip install mini-swe-agent && mini [-v]\n```\n\nOption 3: Install from source\n\n```bash\ngit clone | |
| https://github.com/SWE-agent/mini-swe-agent.git\ncd mini-swe-agent\npip install -e .\nmini [-v]\n```\n\nRead more in our | |
| [documentation](https://mini-swe-agent.com/latest/):\n\n* [Quick start guide](https://mini-swe-agent.com/latest/quickstart/)\n* More on | |
| [`mini`](https://mini-swe-agent.com/latest/usage/mini/) and [`mini -v`](https://mini-swe-agent.com/latest/usage/mini_v/)\n* | |
| [Configuration](https://mini-swe-agent.com/latest/advanced/configuration/)\n* [Power up with the cookbook](https://mini-swe-agent.com/latest/advanced/cookbook/)\n* | |
| [FAQ](https://mini-swe-agent.com/latest/faq/)\n* [Contribute!](https://mini-swe-agent.com/latest/contributing/)\n\n## Bottom line\n\nIf you found this work helpful, | |
| please consider citing the [SWE-agent paper](https://arxiv.org/abs/2405.15793) in your work:\n\n```bibtex\n@inproceedings{yang2024sweagent,\n title={{SWE}-agent: | |
| Agent-Computer Interfaces Enable Automated Software Engineering},\n author={John Yang and Carlos E Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and | |
| Karthik R Narasimhan and Ofir Press},\n booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},\n year={2024},\n | |
| url={https://arxiv.org/abs/2405.15793}\n}\n```\n\nMore agentic AI:\n\n<div align="center">\n <a href="https://github.com/SWE-agent/SWE-agent"><img | |
| src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/sweagent_logo_text_below.svg" alt="SWE-agent" height="120px"></a>\n \n <a | |
| href="https://github.com/SWE-agent/SWE-ReX"><img src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/swerex_logo_text_below.svg" alt="SWE-ReX" | |
| height="120px"></a>\n \n <a href="https://github.com/SWE-bench/SWE-bench"><img | |
| src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/swebench_logo_text_below.svg" alt="SWE-bench" height="120px"></a>\n \n <a | |
| href="https://github.com/SWE-bench/SWE-smith"><img src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/swesmith_logo_text_below.svg" alt="SWE-smith" | |
| height="120px"></a>\n \n <a href="https://github.com/SWE-bench/sb-cli"><img | |
| src="https://github.com/SWE-agent/mini-swe-agent/raw/main/docs/assets/sbcli_logo_text_below.svg" alt="sb-cli" height="120px"></a>\n</div>\n\n', 'returncode': 0} | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT::1-tdd-reviewer (step 7, $0.06): | |
| Now I need to examine the project structure to understand the testing approach and see if there are any test files or testing documentation. | |
| ```bash | |
| find . -name "*.py" -path "*/test*" -o -name "*test*.py" -o -name "test_*.py" | head -20 && echo "---" && find . -name "pytest.ini" -o -name "pyproject.toml" -o -name | |
| "setup.cfg" | head -10 && echo "---" && ls -la | |
| ``` | |
| User: | |
| Observation: {'output': | |
| './.mini-env/lib/python3.13/site-packages/aiohttp/test_utils.py\n./.mini-env/lib/python3.13/site-packages/aiohttp/pytest_plugin.py\n./.mini-env/lib/python3.13/site-packag | |
| es/annotated_types/test_cases.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/mp_preload.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/_ | |
| _init__.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_spawn/test_misc.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tes | |
| ts/test_multiprocessing_spawn/test_processes.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_spawn/__init__.py\n./.mini-env/lib/pytho | |
| n3.13/site-packages/multiprocess/tests/test_multiprocessing_spawn/test_threads.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_spawn/ | |
| test_manager.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/mp_fork_bomb.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiproces | |
| sing_forkserver/test_misc.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_forkserver/test_processes.py\n./.mini-env/lib/python3.13/si | |
| te-packages/multiprocess/tests/test_multiprocessing_forkserver/__init__.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_forkserver/te | |
| st_threads.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_forkserver/test_manager.py\n./.mini-env/lib/python3.13/site-packages/multi | |
| process/tests/test_multiprocessing_fork/test_misc.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_fork/test_processes.py\n./.mini-env | |
| /lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_fork/__init__.py\n./.mini-env/lib/python3.13/site-packages/multiprocess/tests/test_multiprocessing_f | |
| ork/test_threads.py\n---\n./.mini-env/lib/python3.13/site-packages/numpy/f2py/setup.cfg\n./.mini-env/lib/python3.13/site-packages/pandas/pyproject.toml\n./pyproject.toml\ | |
| n---\ntotal 80\ndrwxr-xr-x@ 19 j staff 608 Jul 30 15:43 .\ndrwxr-xr-x@ 31 j staff 992 Jul 25 03:21 ..\ndrwxr-xr-x@ 5 j staff 160 Jul 30 15:44 | |
| .claude\ndrwxr-xr-x@ 3 j staff 96 Jul 25 03:21 .cursor\ndrwxr-xr-x@ 15 j staff 480 Jul 31 14:48 .git\ndrwxr-xr-x@ 10 j staff 320 Jul 28 23:13 | |
| .github\n-rw-r--r--@ 1 j staff 3569 Jul 30 16:00 .gitignore\nlrwxr-xr-x@ 1 j staff 8 Jul 26 20:47 .mini -> .claude/\ndrwxr-xr-x@ 7 j staff 224 Jul 26 | |
| 23:45 .mini-env\n-rw-r--r--@ 1 j staff 956 Jul 30 15:43 .pre-commit-config.yaml\ndrwxr-xr-x@ 6 j staff 192 Jul 25 22:42 .pytest_cache\ndrwxr-xr-x@ 3 j staff | |
| 96 Jul 25 03:21 .vscode\ndrwxr-xr-x@ 15 j staff 480 Jul 31 14:03 docs\n-rw-r--r--@ 1 j staff 1094 Jul 25 03:21 LICENSE.md\n-rw-r--r--@ 1 j staff 4453 Jul 28 | |
| 23:13 mkdocs.yml\n-rw-r--r--@ 1 j staff 6668 Jul 26 23:38 pyproject.toml\n-rw-r--r--@ 1 j staff 10394 Jul 30 15:43 README.md\ndrwxr-xr-x@ 4 j staff 128 Jul 25 | |
| 11:32 src\ndrwxr-xr-x@ 12 j staff 384 Jul 26 23:39 tests\n', 'returncode': 0} | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT::1-tdd-reviewer (step 8, $0.06): | |
| Let me examine the actual test directory and pyproject.toml to understand the testing setup. | |
| ```bash | |
| ls -la tests/ && echo "---" && cat pyproject.toml | grep -A 20 -B 5 "test\|pytest" | |
| ``` | |
| User: | |
| Observation: {'output': 'total 8\n-rw-r--r--@ 1 j staff 0 Jul 25 03:21 __init__.py\ndrwxr-xr-x@ 5 j staff 160 Jul 28 23:16 __pycache__\ndrwxr-xr-x@ 12 j staff | |
| 384 Jul 26 23:39 .\ndrwxr-xr-x@ 19 j staff 608 Jul 30 15:43 ..\ndrwxr-xr-x@ 7 j staff 224 Jul 28 23:13 agents\ndrwxr-xr-x@ 5 j staff 160 Jul 25 23:07 | |
| config\n-rw-r--r--@ 1 j staff 2020 Jul 26 23:38 conftest.py\ndrwxr-xr-x@ 8 j staff 256 Jul 25 23:07 environments\ndrwxr-xr-x@ 9 j staff 288 Jul 28 23:13 | |
| models\ndrwxr-xr-x@ 13 j staff 416 Jul 28 23:13 run\ndrwxr-xr-x@ 3 j staff 96 Jul 26 21:41 subagents\ndrwxr-xr-x@ 7 j staff 224 Jul 25 20:03 test_data\n---\n | |
| "swe-rex",\n]\n\ndev = [\n "datasets",\n "pytest",\n "pytest-cov",\n "pytest-asyncio",\n "pytest-xdist",\n "pre-commit",\n "ruff",\n | |
| "mkdocs-include-markdown-plugin",\n "mkdocstrings[python]>=0.18",\n "mike",\n "mkdocs-material",\n "mkdocs-glightbox",\n]\n\n[project.urls]\nDocumentation = | |
| "https://mini-swe-agent.com/latest/"\nRepository = "https://github.com/SWE-agent/mini-SWE-agent"\n"Bug Tracker" = | |
| "https://github.com/SWE-agent/mini-SWE-agent/issues"\n\n[project.scripts]\nmini = "minisweagent.run.mini:app"\nmini-swe-agent = "minisweagent.run.mini:app"\nmini-extra = | |
| "minisweagent.run.mini_extra:main"\nmini-e= "minisweagent.run.mini_extra:main"\n\n[tool.setuptools]\ninclude-package-data = true\n\n[tool.setuptools.dynamic]\nversion = | |
| {attr = "minisweagent.__version__"}\n\n[tool.setuptools.packages.find]\nwhere = ["src"]\ninclude = ["minisweagent*"]\n\n[tool.setuptools.package-data]\n--\n | |
| ".ipynb_checkpoints",\n ".mypy_cache",\n ".nox",\n ".pants.d",\n ".pyenv",\n ".pytest_cache",\n ".pytype",\n ".ruff_cache",\n ".svn",\n | |
| ".tox",\n ".venv",\n ".vscode",\n "__pypackages__",\n "_build",\n "buck-out",\n "build",\n "dist",\n "node_modules",\n "site-packages",\n | |
| "venv",\n # ---- project specific ----\n "tests/test_data",\n # Exclude commands so they don\'t get the __future__ imports\n | |
| "config/commands",\n]\n\nline-length = 120\nindent-width = 4\n\ntarget-version = "py310"\n\n[tool.ruff.lint]\n# Enable Pyflakes (`F`) and a subset of the pycodestyle | |
| (`E`) codes by default.\n# Unlike Flake8, Ruff doesn\'t enable pycodestyle warnings (`W`) or\n# McCabe complexity (`C901`) by default.\n# I001: Isort, I002: required | |
| import\nselect = [\n # Error (E)\n "E",\n # Error (PLE)\n "PLE",\n # pycodestyle\n--\n "EM",\n # flake8-return\n "RET",\n # RUF\n "RUF019", | |
| # unneded key in dict check\n # pytest\n "PT",\n # flake8-simplify (SIM)\n "SIM201",\n # flake8-use-pathlib\n "PTH100",\n "PTH110",\n "PTH111",\n | |
| "PTH112",\n "PTH113",\n "PTH114",\n "PTH117",\n "PTH118",\n "PTH119",\n "PTH120",\n "PTH121",\n "PTH122",\n "PTH202",\n "PTH203",\n | |
| "PTH204",\n "PTH205",\n--\n # error (E)\n "E501", # line too long\n "E402", # import not on top of file\n "E722", # bare except\n "E741", # | |
| ambiguous symbol\n # pytest\n "PT011",\n "PT018",\n # flake8-errmsg\n "EM101", # exception must not use a string literal\n "EM102", # exception must | |
| not use an f-string literal\n "EM103", # exception must not use a .format(...) string directly\n]\n\n# Allow fix for all enabled rules (when `--fix`) is | |
| provided.\nfixable = ["ALL"]\nunfixable = []\n\n# Allow unused variables when underscore-prefixed.\ndummy-variable-rgx = | |
| "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"\n\n[tool.ruff.format]\n# Like Black, use double quotes for strings.\nquote-style = "double"\n\n# Like Black, indent with spaces, | |
| rather than tabs.\n--\n "unspecified-encoding", # W1514 - disable unspecified encoding warnings\n "duplicate-code", # R0801 - disable code duplication | |
| checks\n "too-few-public-methods", # R0903 - disable too few public methods warnings\n]\n\n[tool.pytest.ini_options]\nasyncio_mode = | |
| "auto"\nasyncio_default_fixture_loop_scope = "function"\nmarkers = [\n "slow: marks tests as slow (deselect with \'-k \\"not slow\\"\')",\n]\n', 'returncode': 0} | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT::1-tdd-reviewer (step 9, $0.07): | |
| Let me examine some actual test files to understand the testing structure and approach. | |
| ```bash | |
| find tests/ -name "*.py" | head -10 && echo "---" && ls tests/agents/ && echo "---" && cat tests/conftest.py | |
| ``` | |
| User: | |
| Observation: {'output': | |
| 'tests/conftest.py\ntests/config/__init__.py\ntests/config/test_swebench_template.py\ntests/environments/test_singularity.py\ntests/environments/__init__.py\ntests/enviro | |
| nments/extra/test_swerex_docker.py\ntests/environments/extra/__init__.py\ntests/environments/test_local.py\ntests/environments/test_docker.py\ntests/test_data/update_outp | |
| uts_local.py\n---\n__init__.py\n__pycache__\ntest_default.py\ntest_interactive_textual.py\ntest_interactive.py\n---\nimport json\nimport threading\nfrom pathlib import | |
| Path\n\nimport pytest\n\nfrom minisweagent.models import GLOBAL_MODEL_STATS\n\n# Global lock for tests that modify global state - this works across | |
| threads\n_global_stats_lock = threading.Lock()\n\n\[email protected]\ndef reset_global_stats():\n """Reset global model stats and ensure exclusive access for tests that | |
| need it.\n\n This fixture should be used by any test that depends on global model stats\n to ensure thread safety and test isolation.\n """\n with | |
| _global_stats_lock:\n # Reset at start\n GLOBAL_MODEL_STATS._cost = 0.0 # noqa: protected-access\n GLOBAL_MODEL_STATS._n_calls = 0 # noqa: | |
| protected-access\n yield\n # Reset at end to clean up\n GLOBAL_MODEL_STATS._cost = 0.0 # noqa: protected-access\n GLOBAL_MODEL_STATS._n_calls | |
| = 0 # noqa: protected-access\n\n\ndef get_test_data(trajectory_name: str) -> dict[str, list[str]]:\n """Load test fixtures from a trajectory JSON file"""\n | |
| json_path = Path(__file__).parent / "test_data" / f"{trajectory_name}.traj.json"\n with json_path.open() as f:\n trajectory = json.load(f)\n\n # Extract | |
| model responses (assistant messages, starting from index 2)\n model_responses = []\n # Extract expected observations (user messages, starting from index 3)\n | |
| expected_observations = []\n\n for i, message in enumerate(trajectory):\n if i < 2: # Skip system message (0) and initial user message (1)\n | |
| continue\n\n if message["role"] == "assistant":\n model_responses.append(message["content"])\n elif message["role"] == "user":\n | |
| expected_observations.append(message["content"])\n\n return {"model_responses": model_responses, "expected_observations": | |
| expected_observations}\n\n\[email protected]\ndef github_test_data():\n """Load GitHub issue test fixtures"""\n return | |
| get_test_data("github_issue")\n\n\[email protected]\ndef local_test_data():\n """Load local test fixtures"""\n return get_test_data("local")\n', 'returncode': 0} | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT::1-tdd-reviewer (step 10, $0.08): | |
| Let me examine a specific test file to understand the testing patterns used. | |
| ```bash | |
| cat tests/agents/test_default.py | |
| ``` | |
| User: | |
| Observation: {'output': 'import pytest\n\nfrom minisweagent.agents.default import DefaultAgent, NonTerminatingException\nfrom minisweagent.environments.local import | |
| LocalEnvironment\nfrom minisweagent.models.test_models import DeterministicModel\n\n\ndef test_successful_completion():\n """Test agent completes successfully when | |
| MINI_SWE_AGENT_FINAL_OUTPUT is encountered."""\n agent = DefaultAgent(\n model=DeterministicModel(\n outputs=[\n "I\'ll echo a | |
| message\\n```bash\\necho \'hello world\'\\n```",\n "Now finishing\\n```bash\\necho \'MINI_SWE_AGENT_FINAL_OUTPUT\'\\necho \'Task completed | |
| successfully\'\\n```",\n ]\n ),\n env=LocalEnvironment(),\n )\n\n exit_status, result = agent.run("Echo hello world then finish")\n | |
| assert exit_status == "Submitted"\n assert result == "Task completed successfully"\n assert agent.model.n_calls == 2\n assert len(agent.messages) == 6 # system, | |
| user, assistant, user, assistant, user\n\n\ndef test_step_limit_enforcement():\n """Test agent stops when step limit is reached."""\n agent = DefaultAgent(\n | |
| model=DeterministicModel(\n outputs=["First command\\n```bash\\necho \'step1\'\\n```", "Second command\\n```bash\\necho \'step2\'\\n```"]\n ),\n | |
| env=LocalEnvironment(),\n step_limit=1,\n )\n\n exit_status, _ = agent.run("Run multiple commands")\n assert exit_status == "LimitsExceeded"\n assert | |
| agent.model.n_calls == 1\n\n\ndef test_cost_limit_enforcement():\n """Test agent stops when cost limit is reached."""\n model = | |
| DeterministicModel(outputs=["```bash\\necho \'test\'\\n```"])\n model.cost = 1.0\n\n agent = DefaultAgent(\n model=model,\n env=LocalEnvironment(),\n | |
| cost_limit=0.5,\n )\n\n exit_status, _ = agent.run("Test cost limit")\n assert exit_status == "LimitsExceeded"\n\n\ndef test_format_error_handling():\n | |
| """Test agent handles malformed action formats properly."""\n agent = DefaultAgent(\n model=DeterministicModel(\n outputs=[\n "No code | |
| blocks here",\n "Multiple blocks\\n```bash\\necho \'first\'\\n```\\n```bash\\necho \'second\'\\n```",\n "Now correct\\n```bash\\necho | |
| \'MINI_SWE_AGENT_FINAL_OUTPUT\'\\necho \'done\'\\n```",\n ]\n ),\n env=LocalEnvironment(),\n )\n\n exit_status, result = agent.run("Test | |
| format errors")\n assert exit_status == "Submitted"\n assert result == "done"\n assert agent.model.n_calls == 3\n # Should have error messages in | |
| conversation\n assert (\n len([msg for msg in agent.messages if "Please always provide EXACTLY ONE action" in msg.get("content", "")])\n == 2\n | |
| )\n\n\ndef test_timeout_handling():\n """Test agent handles command timeouts properly."""\n agent = DefaultAgent(\n model=DeterministicModel(\n | |
| outputs=[\n "Long sleep\\n```bash\\nsleep 5\\n```", # This will timeout\n "Quick finish\\n```bash\\necho | |
| \'MINI_SWE_AGENT_FINAL_OUTPUT\'\\necho \'recovered\'\\n```",\n ]\n ),\n env=LocalEnvironment(timeout=1), # Very short timeout\n )\n\n | |
| exit_status, result = agent.run("Test timeout handling")\n assert exit_status == "Submitted"\n assert result == "recovered"\n # Should have timeout error | |
| message\n assert len([msg for msg in agent.messages if "timed out" in msg.get("content", "")]) == 1\n\n\ndef test_timeout_captures_partial_output():\n """Test that | |
| timeout error captures partial output from commands that produce output before timing out."""\n num1, num2 = 111, 9\n calculation_command = f"echo | |
| $(({num1}*{num2})); sleep 10"\n expected_output = str(num1 * num2)\n agent = DefaultAgent(\n model=DeterministicModel(\n outputs=[\n | |
| f"Output then sleep\\n```bash\\n{calculation_command}\\n```",\n "Quick finish\\n```bash\\necho \'MINI_SWE_AGENT_FINAL_OUTPUT\'\\necho | |
| \'recovered\'\\n```",\n ]\n ),\n env=LocalEnvironment(timeout=1),\n )\n exit_status, result = agent.run("Test timeout with partial | |
| output")\n assert exit_status == "Submitted"\n assert result == "recovered" # final output should be `recovered` from the last command\n timed_out_messages = | |
| [msg for msg in agent.messages if "timed out" in msg.get("content", "")]\n assert len(timed_out_messages) == 1\n assert expected_output in | |
| timed_out_messages[0]["content"] # ensure timed out output is still captured\n\n\ndef test_parse_action_success():\n """Test action parsing works correctly for valid | |
| formats."""\n agent = DefaultAgent(\n model=DeterministicModel(outputs=[]),\n env=LocalEnvironment(),\n )\n\n # Test different valid formats\n | |
| result = agent.parse_action({"content": "```bash\\necho \'test\'\\n```"})\n assert result["action"] == "echo \'test\'"\n assert result["content"] == "```bash\\necho | |
| \'test\'\\n```"\n\n result = agent.parse_action({"content": "```bash\\nls -la\\n```"})\n assert result["action"] == "ls -la"\n assert result["content"] == | |
| "```bash\\nls -la\\n```"\n\n result = agent.parse_action({"content": "Some text\\n```bash\\necho \'hello\'\\n```\\nMore text"})\n assert result["action"] == "echo | |
| \'hello\'"\n assert result["content"] == "Some text\\n```bash\\necho \'hello\'\\n```\\nMore text"\n\n\ndef test_parse_action_failures():\n """Test action parsing | |
| raises appropriate exceptions for invalid formats."""\n agent = DefaultAgent(\n model=DeterministicModel(outputs=[]),\n env=LocalEnvironment(),\n | |
| )\n\n # No code blocks\n with pytest.raises(NonTerminatingException):\n agent.parse_action({"content": "No code blocks here"})\n\n # Multiple code | |
| blocks\n with pytest.raises(NonTerminatingException):\n agent.parse_action({"content": "```bash\\necho \'first\'\\n```\\n```bash\\necho \'second\'\\n```"})\n\n | |
| # Code block without bash language specifier\n with pytest.raises(NonTerminatingException):\n agent.parse_action({"content": "```\\nls -la\\n```"})\n\n\ndef | |
| test_message_history_tracking():\n """Test that messages are properly added and tracked."""\n agent = DefaultAgent(\n model=DeterministicModel(\n | |
| outputs=[\n "Response 1\\n```bash\\necho \'test1\'\\n```",\n "Response 2\\n```bash\\necho \'MINI_SWE_AGENT_FINAL_OUTPUT\'\\necho | |
| \'done\'\\n```",\n ]\n ),\n env=LocalEnvironment(),\n )\n\n exit_status, result = agent.run("Track messages")\n assert exit_status == | |
| "Submitted"\n assert result == "done"\n\n # After completion should have full conversation\n assert len(agent.messages) == 6\n assert [msg["role"] for msg in | |
| agent.messages] == ["system", "user", "assistant", "user", "assistant", "user"]\n\n\ndef test_multiple_steps_before_completion():\n """Test agent can handle multiple | |
| steps before finding completion signal."""\n agent = DefaultAgent(\n model=DeterministicModel(\n outputs=[\n "Step 1\\n```bash\\necho | |
| \'first\'\\n```",\n "Step 2\\n```bash\\necho \'second\'\\n```",\n "Step 3\\n```bash\\necho \'third\'\\n```",\n "Final | |
| step\\n```bash\\necho \'MINI_SWE_AGENT_FINAL_OUTPUT\'\\necho \'completed all steps\'\\n```",\n ]\n ),\n env=LocalEnvironment(),\n | |
| cost_limit=5.0, # Increase cost limit to allow all 4 calls (4.0 total cost)\n )\n\n exit_status, result = agent.run("Multi-step task")\n assert exit_status == | |
| "Submitted"\n assert result == "completed all steps"\n assert agent.model.n_calls == 4\n\n # Check that all intermediate outputs are captured (final step | |
| doesn\'t get observation due to termination)\n observations = [\n msg["content"] for msg in agent.messages if msg["role"] == "user" and "Observation:" in | |
| msg["content"]\n ]\n assert len(observations) == 3\n assert "first" in observations[0]\n assert "second" in observations[1]\n assert "third" in | |
| observations[2]\n\n\ndef test_custom_config():\n """Test agent works with custom configuration."""\n agent = DefaultAgent(\n model=DeterministicModel(\n | |
| outputs=["Test response\\n```bash\\necho \'MINI_SWE_AGENT_FINAL_OUTPUT\'\\necho \'custom config works\'\\n```"]\n ),\n env=LocalEnvironment(),\n | |
| system_template="You are a test assistant.",\n instance_template="Task: {{task}}. Return bash command.",\n step_limit=2,\n cost_limit=1.0,\n )\n\n | |
| exit_status, result = agent.run("Test custom config")\n assert exit_status == "Submitted"\n assert result == "custom config works"\n assert | |
| agent.messages[0]["content"] == "You are a test assistant."\n assert "Test custom config" in agent.messages[1]["content"]\n', 'returncode': 0} | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT::1-tdd-reviewer (step 11, $0.09): | |
| Now let me check if there are any GitHub Actions or CI configuration files to understand the testing workflow. | |
| ```bash | |
| ls -la .github/workflows/ && echo "---" && cat .github/workflows/*.yml | head -50 | |
| ``` | |
| User: | |
| Observation: {'output': 'total 40\ndrwxr-xr-x@ 7 j staff 224 Jul 28 23:13 .\ndrwxr-xr-x@ 10 j staff 320 Jul 28 23:13 ..\n-rw-r--r--@ 1 j staff 1434 Jul 28 23:13 | |
| build-docs.yaml\n-rw-r--r--@ 1 j staff 345 Jul 25 03:21 check-links.yaml\n-rw-r--r--@ 1 j staff 1421 Jul 28 23:13 pylint.yaml\n-rw-r--r--@ 1 j staff 1893 Jul 28 | |
| 23:13 pytest.yaml\n-rw-r--r--@ 1 j staff 3203 Jul 28 23:13 release.yaml\n---\ncat: .github/workflows/*.yml: No such file or directory\n', 'returncode': 0} | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT::1-tdd-reviewer (step 12, $0.10): | |
| Let me check the pytest workflow file to understand the testing setup. | |
| ```bash | |
| cat .github/workflows/pytest.yaml | |
| ``` | |
| User: | |
| Observation: {'output': "\nname: Pytest\n\nenv:\n GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n\non:\n push:\n branches:\n - main\n paths-ignore:\n - | |
| 'docs/**'\n - 'README.md'\n - 'mkdocs.yml'\n - '.cursor/**'\n - '.github/workflows/build-docs.yaml'\n - '.github/workflows/release.yaml'\n - | |
| '.github/workflows/pylint.yaml'\n pull_request:\n branches:\n - main\n paths-ignore:\n - 'docs/**'\n - 'README.md'\n - 'mkdocs.yml'\n - | |
| '.cursor/**'\n - '.github/workflows/build-docs.yaml'\n - '.github/workflows/release.yaml'\n - '.github/workflows/pylint.yaml'\n\njobs:\n test:\n | |
| runs-on: ubuntu-latest\n defaults:\n run:\n shell: bash -l {0}\n steps:\n - name: Checkout code\n uses: actions/checkout@v4\n - uses: | |
| actions/setup-python@v5\n with:\n python-version: '3.11'\n - name: Install podman and singularity\n run: |\n sudo apt-get update\n | |
| sudo apt-get install -y podman\n # Install singularity/apptainer\n sudo apt-get install -y wget\n cd /tmp\n wget | |
| https://github.com/apptainer/apptainer/releases/download/v1.3.4/apptainer_1.3.4_amd64.deb\n sudo dpkg -i apptainer_1.3.4_amd64.deb || sudo apt-get install -f | |
| -y\n # Create singularity symlink for compatibility\n sudo ln -sf /usr/bin/apptainer /usr/bin/singularity\n - name: Install uv\n run: | |
| |\n curl -LsSf https://astral.sh/uv/install.sh | sh\n - name: Install dependencies\n run: |\n uv pip install --python ${Python_ROOT_DIR} | |
| -e '.[full]'\n - name: Run pytest\n run: pytest -v --cov --cov-branch --cov-report=xml -n auto\n - name: Upload coverage reports to Codecov\n | |
| uses: codecov/codecov-action@v5\n with:\n token: ${{ secrets.CODECOV_TOKEN }}\n slug: SWE-agent/mini-swe-agent", 'returncode': 0} | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT::1-tdd-reviewer (step 13, $0.12): | |
| Now let me examine the coverage and testing configuration to get a complete picture. | |
| ```bash | |
| echo "MINI_SWE_AGENT_FINAL_OUTPUT" && echo "" && echo "## TDD Review of mini-swe-agent README.md" && echo "" && echo "### Analysis Summary" && echo "" && echo "After | |
| reviewing the README.md file and examining the project structure, testing setup, and actual test files, here is my evaluation from a TDD perspective:" && echo "" && echo | |
| "### 1. Testing Approach Documentation" && echo "" && echo "**STRENGTHS:**" && echo "- The README mentions the project is 'Tested' with a Codecov badge showing test | |
| coverage" && echo "- Links to comprehensive documentation that likely contains more testing details" && echo "- Shows the project has a mature CI/CD pipeline with | |
| automated testing" && echo "" && echo "**WEAKNESSES:**" && echo "- No explicit mention of TDD methodology or testing philosophy in the README" && echo "- No description | |
| of the testing approach (unit tests, integration tests, etc.)" && echo "- Missing information about how to run tests locally for contributors" && echo "" && echo "### 2. | |
| TDD Alignment Assessment" && echo "" && echo "**POSITIVE INDICATORS:**" && echo "- Comprehensive test suite with 100+ test files covering agents, environments, models, | |
| and configurations" && echo "- Well-structured test organization mirroring the source code structure" && echo "- Use of pytest with proper fixtures and test isolation | |
| (conftest.py with global state management)" && echo "- Automated testing in CI with coverage reporting" && echo "- Test files follow good naming conventions (test_*.py)" | |
| && echo "" && echo "**AREAS FOR IMPROVEMENT:**" && echo "- No explicit documentation of TDD red-green-refactor cycles" && echo "- README doesn't emphasize test-first | |
| development approach" && echo "- Missing guidance for contributors on writing tests before implementation" && echo "" && echo "### 3. Specific TDD Recommendations for | |
| README" && echo "" && echo "**Add a Testing Section:**" && echo "```markdown" && echo "## Testing & Development" && echo "" && echo "mini-swe-agent follows Test-Driven | |
| Development (TDD) principles:" && echo "" && echo "### Running Tests" && echo "```bash" && echo "# Install development dependencies" && echo "pip install -e '.[dev]'" && | |
| echo "" && echo "# Run all tests" && echo "pytest" && echo "" && echo "# Run with coverage" && echo "pytest --cov --cov-report=html" && echo "```" && echo "" && echo "### | |
| Contributing with TDD" && echo "When contributing new features:" && echo "1. **RED**: Write a failing test that describes the desired behavior" && echo "2. **GREEN**: | |
| Write minimal code to make the test pass" && echo "3. **REFACTOR**: Improve code quality while keeping tests green" && echo "4. **COMMIT**: Commit your changes with | |
| descriptive messages" && echo "" && echo "See our [Testing Guide](link-to-testing-docs) for detailed examples." && echo "```" && echo "" && echo "**Enhance the 'Tested' | |
| bullet point:**" && echo "Instead of just '**Tested:** []'" && echo "Use: '**Test-Driven:** Comprehensive test suite with >90% coverage following TDD | |
| principles []'" && echo "" && echo "### 4. Documentation Structure Support for TDD" && echo "" && echo "**STRENGTHS:**" && echo "- Good separation of | |
| concerns in documentation structure" && echo "- Links to comprehensive docs that can contain detailed testing information" && echo "- Clear installation and usage | |
| instructions" && echo "" && echo "**RECOMMENDATIONS:**" && echo "- Add a 'Development' or 'Contributing' section with TDD workflow" && echo "- Include test command | |
| examples in the quick start" && echo "- Mention test coverage as a quality indicator" && echo "" && echo "### 5. Overall Assessment" && echo "" && echo "**Score: 7/10 for | |
| TDD Support**" && echo "" && echo "The project demonstrates good testing practices with comprehensive test coverage and proper CI/CD, but the README lacks explicit TDD | |
| methodology documentation. The testing infrastructure is solid, but contributors would benefit from clearer guidance on the TDD workflow." && echo "" && echo "**Key | |
| Improvements Needed:**" && echo "1. Add explicit TDD methodology section" && echo "2. Include local testing instructions" && echo "3. Provide contributor guidelines for | |
| test-first development" && echo "4. Emphasize testing as a core project value, not just a quality metric" && echo "" && echo "The project appears to follow good testing | |
| practices internally, but the README could better communicate this commitment to potential contributors and users." | |
| ``` | |
| Agent wants to finish. Type a comment to give it a new task or press enter to quit. | |
| > | |
| User: | |
| ## TDD Review of mini-swe-agent README.md | |
| ### Analysis Summary | |
| After reviewing the README.md file and examining the project structure, testing setup, and actual test files, here is my evaluation from a TDD perspective: | |
| ### 1. Testing Approach Documentation | |
| **STRENGTHS:** | |
| - The README mentions the project is 'Tested' with a Codecov badge showing test coverage | |
| - Links to comprehensive documentation that likely contains more testing details | |
| - Shows the project has a mature CI/CD pipeline with automated testing | |
| **WEAKNESSES:** | |
| - No explicit mention of TDD methodology or testing philosophy in the README | |
| - No description of the testing approach (unit tests, integration tests, etc.) | |
| - Missing information about how to run tests locally for contributors | |
| ### 2. TDD Alignment Assessment | |
| **POSITIVE INDICATORS:** | |
| - Comprehensive test suite with 100+ test files covering agents, environments, models, and configurations | |
| - Well-structured test organization mirroring the source code structure | |
| - Use of pytest with proper fixtures and test isolation (conftest.py with global state management) | |
| - Automated testing in CI with coverage reporting | |
| - Test files follow good naming conventions (test_*.py) | |
| **AREAS FOR IMPROVEMENT:** | |
| - No explicit documentation of TDD red-green-refactor cycles | |
| - README doesn't emphasize test-first development approach | |
| - Missing guidance for contributors on writing tests before implementation | |
| ### 3. Specific TDD Recommendations for README | |
| **Add a Testing Section:** | |
| /bin/sh: command substitution: line 0: syntax error near unexpected token `(' | |
| /bin/sh: command substitution: line 0: `markdown" && echo "## Testing & Development" && echo "" && echo "mini-swe-agent follows Test-Driven Development (TDD) principles:" | |
| && echo "" && echo "### Running Tests" && echo "' | |
| bash | |
| # Install development dependencies | |
| pip install -e '.[dev]' | |
| # Run all tests | |
| pytest | |
| # Run with coverage | |
| pytest --cov --cov-report=html | |
| /bin/sh: command substitution: line 0: syntax error near unexpected token `(' | |
| /bin/sh: command substitution: line 0: `" && echo "" && echo "### Contributing with TDD" && echo "When contributing new features:" && echo "1. **RED**: Write a failing | |
| test that describes the desired behavior" && echo "2. **GREEN**: Write minimal code to make the test pass" && echo "3. **REFACTOR**: Improve code quality while keeping | |
| tests green" && echo "4. **COMMIT**: Commit your changes with descriptive messages" && echo "" && echo "See our [Testing Guide](link-to-testing-docs) for detailed | |
| examples." && echo "' | |
| **Enhance the 'Tested' bullet point:** | |
| Instead of just '**Tested:** []' | |
| Use: '**Test-Driven:** Comprehensive test suite with >90% coverage following TDD principles []' | |
| ### 4. Documentation Structure Support for TDD | |
| **STRENGTHS:** | |
| - Good separation of concerns in documentation structure | |
| - Links to comprehensive docs that can contain detailed testing information | |
| - Clear installation and usage instructions | |
| **RECOMMENDATIONS:** | |
| - Add a 'Development' or 'Contributing' section with TDD workflow | |
| - Include test command examples in the quick start | |
| - Mention test coverage as a quality indicator | |
| ### 5. Overall Assessment | |
| **Score: 7/10 for TDD Support** | |
| The project demonstrates good testing practices with comprehensive test coverage and proper CI/CD, but the README lacks explicit TDD methodology documentation. The | |
| testing infrastructure is solid, but contributors would benefit from clearer guidance on the TDD workflow. | |
| **Key Improvements Needed:** | |
| 1. Add explicit TDD methodology section | |
| 2. Include local testing instructions | |
| 3. Provide contributor guidelines for test-first development | |
| 4. Emphasize testing as a core project value, not just a quality metric | |
| The project appears to follow good testing practices internally, but the README could better communicate this commitment to potential contributors and users. | |
| ━━━ ROOT::1-tdd-reviewer finished: Submitted ━━━ | |
| User: | |
| <returncode></returncode> | |
| <output> | |
| Agent ROOT::1-tdd-reviewer returned: | |
| ## TDD Review of mini-swe-agent README.md | |
| ### Analysis Summary | |
| After reviewing the README.md file and examining the project structure, testing setup, and actual test files, here is my evaluation from a TDD perspective: | |
| ### 1. Testing Approach Documentation | |
| **STRENGTHS:** | |
| - The README mentions the project is 'Tested' with a Codecov badge showing test coverage | |
| - Links to comprehensive documentation that likely contains more testing details | |
| - Shows the project has a mature CI/CD pipeline with automated testing | |
| **WEAKNESSES:** | |
| - No explicit mention of TDD methodology or testing philosophy in the README | |
| - No description of the testing approach (unit tests, integration tests, etc.) | |
| - Missing information about how to run tests locally for contributors | |
| ### 2. TDD Alignment Assessment | |
| **POSITIVE INDICATORS:** | |
| - Comprehensive test suite with 100+ test files covering agents, environments, models, and configurations | |
| - Well-structured test organization mirroring the source code structure | |
| - Use of pytest with proper fixtures and test isolation (conftest.py with global state management) | |
| - Automated testing in CI with coverage reporting | |
| - Test files follow good naming conventions (test_*.py) | |
| **AREAS FOR IMPROVEMENT:** | |
| - No explicit documentation of TDD red-green-refactor cycles | |
| - README doesn't emphasize test-first development approach | |
| - Missing guidance for contributors on writing tests before implementation | |
| ### 3. Specific TDD Recommendations for README | |
| **Add a Testing Section:** | |
| /bin/sh: command substitution: line 0: syntax error near unexpected token `(' | |
| /bin/sh: command substitution: line 0: `markdown" && echo "## Testing & Development" && echo "" && echo "mini-swe-agent follows Test-Driven Development (TDD) principles:" | |
| && echo "" && echo "### Running Tests" && echo "' | |
| bash | |
| # Install development dependencies | |
| pip install -e '.[dev]' | |
| # Run all tests | |
| pytest | |
| # Run with coverage | |
| pytest --cov --cov-report=html | |
| /bin/sh: command substitution: line 0: syntax error near unexpected token `(' | |
| /bin/sh: command substitution: line 0: `" && echo "" && echo "### Contributing with TDD" && echo "When contributing new features:" && echo "1. **RED**: Write a failing | |
| test that describes the desired behavior" && echo "2. **GREEN**: Write minimal code to make the test pass" && echo "3. **REFACTOR**: Improve code quality while keeping | |
| tests green" && echo "4. **COMMIT**: Commit your changes with descriptive messages" && echo "" && echo "See our [Testing Guide](link-to-testing-docs) for detailed | |
| examples." && echo "' | |
| **Enhance the 'Tested' bullet point:** | |
| Instead of just '**Tested:** []' | |
| Use: '**Test-Driven:** Comprehensive test suite with >90% coverage following TDD principles []' | |
| ### 4. Documentation Structure Support for TDD | |
| **STRENGTHS:** | |
| - Good separation of concerns in documentation structure | |
| - Links to comprehensive docs that can contain detailed testing information | |
| - Clear installation and usage instructions | |
| **RECOMMENDATIONS:** | |
| - Add a 'Development' or 'Contributing' section with TDD workflow | |
| - Include test command examples in the quick start | |
| - Mention test coverage as a quality indicator | |
| ### 5. Overall Assessment | |
| **Score: 7/10 for TDD Support** | |
| The project demonstrates good testing practices with comprehensive test coverage and proper CI/CD, but the README lacks explicit TDD methodology documentation. The | |
| testing infrastructure is solid, but contributors would benefit from clearer guidance on the TDD workflow. | |
| **Key Improvements Needed:** | |
| 1. Add explicit TDD methodology section | |
| 2. Include local testing instructions | |
| 3. Provide contributor guidelines for test-first development | |
| 4. Emphasize testing as a core project value, not just a quality metric | |
| The project appears to follow good testing practices internally, but the README could better communicate this commitment to potential contributors and users.</output> | |
| ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── | |
| ROOT (step 14, $0.13): | |
| THOUGHT: The TDD Reviewer has provided excellent feedback on the README.md file. They've identified that while the project has good testing practices internally, the | |
| README lacks explicit documentation about TDD methodology and testing approach. The reviewer suggests adding a Testing section and improving the documentation to better | |
| support TDD workflows. Since this was a review task and the TDD Reviewer has completed their analysis, I can now finish the task. | |
| ```bash | |
| echo MINI_SWE_AGENT_FINAL_OUTPUT | |
| ``` | |
| Agent wants to finish. Type a comment to give it a new task or press enter to quit. | |
| > | |
| User: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment