- Go to https://huggingface.co/papers and click through each of the top 3 upvoted papers.
- For each paper:
- Record the title, URL and upvotes
- Summarise the abstract section
- Finally, compile together a summary of all 3 papers, ranked by upvotes
model: Ophiuchi-Qwen3-14B
nanobrowser version: v0.1.5
result:
- passed 100%
model: Devstral-small
nanobrowser version: v0.1.6
result:
- passed 100%
Go to TechCrunch and extract top 10 headlines from the last 24 hours
model: Ophiuchi-Qwen3-14B
nanobrowser version: v0.1.5
result:
- passed 100%
model: Devstral-small
nanobrowser version: v0.1.6
result:
- passed, but lists 11 items instead of 10
Look for the trending Python repositories on GitHub with most stars
model: Ophiuchi-Qwen3-14B
nanobrowser version: v0.1.5
result:
- failed cannot find the trending repo page
model: Devstral-small
nanobrowser version: v0.1.6
result:
- failed cannot find the trending repo page
go to https://github.com/trending and look for the trending Python repositories with most stars, scroll down if needed.
model: Ophiuchi-Qwen3-14B
nanobrowser version: v0.1.5
result:
- failed not scrolling down to get the whole page content, even when explicitly told to
model: Devstral-small
nanobrowser version: v0.1.6
result:
- passed 100%
go to ebay and find the cheapest epyc milan 64-core cpu
model: Ophiuchi-Qwen3-14B
nanobrowser version: v0.1.5
result:
- passed: found the correct listing
- failed: after that with msg from planner:
Planning failed: Failed to invoke qwen3gemini with structured output: Error: Invalid boolean string
model: Devstral-small
nanobrowser version: v0.1.6
result:
- failed: planner gets stuck, repeatly sending one token, msg rom planner
Planning failed: Failed to invoke devstral with structured output: Error: Invalid boolean string
get the 3 most recent papers about cancer from Pubmed
model: Ophiuchi-Qwen3-14B
nanobrowser version: v0.1.5
result:
- passed: found 3 papers about cancer -
- failed: not 3 most recent ones
model: Devstral-small
nanobrowser version: v0.1.6
result:
- passed: found 3 papers about cancer
- failed: not 3 most recent ones
get the 3 most recent papers about cancer from Pubmed use sort function of the site if needed
model: Devstral-small
nanobrowser version: v0.1.6
result:
- passed: 100%