Skip to content

Instantly share code, notes, and snippets.

@johnlindquist
Created September 29, 2025 20:58
Show Gist options
  • Save johnlindquist/30c9117e13a0f852ba4c482eabc27600 to your computer and use it in GitHub Desktop.
Save johnlindquist/30c9117e13a0f852ba4c482eabc27600 to your computer and use it in GitHub Desktop.
Claude Sonnet 4.5 Release Reviews Summary - September 29, 2025

Claude Sonnet 4.5 Release Reviews Summary

Date: September 29, 2025
Sources: Anthropic Official Announcement, Simon Willison's Weblog, TechCrunch


Overview

Anthropic released Claude Sonnet 4.5 today, positioning it as "the best coding model in the world" with significant improvements in software engineering, agentic capabilities, and reasoning tasks.


Key Claims from Anthropic

  • Best coding model in the world
    • Strongest model for building complex agents
    • Best model at using computers (computer use capabilities)
    • Substantial gains in reasoning and math
    • Most aligned frontier model Anthropic has ever released

Performance Highlights

Coding Excellence

  • 82% on SWE-bench Verified evaluation (n=500)
    • Comparison: Opus 4.1 (80.2%), Sonnet 4 (72.7%)
    • Maintains focus for 30+ hours on complex, multi-step tasks
    • Capable of building "production-ready" applications rather than just prototypes
    • Represents a significant leap in reliability from previous AI models

Competitive Benchmarks

SWE-bench Verified Performance:

  • Claude Sonnet 4.5: 82.0%
    • Claude Opus 4.1: 79.4%
    • Claude Sonnet 4: 80.2%
    • GPT-5 Codex: 74.5%
    • GPT-5: 72.8%
    • Gemini 2.5 Pro: 67.2%

Note: Anthropic AI researcher David Hershey cautions that benchmarks alone don't fully capture Sonnet 4.5's performance capabilities.


Pricing & Availability

  • Input tokens: $3 per million tokens (~750,000 words)
    • Output tokens: $15 per million tokens
    • Same pricing as Claude Sonnet 4
    • Significantly cheaper than Claude Opus ($15/$75 per million tokens)
    • More expensive than GPT-5 and GPT-5-Codex (both at $1.25/$10)
    • Available now via Claude API and Claude chatbot

Product Ecosystem Updates

Anthropic released Sonnet 4.5 alongside major product enhancements:

Claude Code

  • Checkpoints feature (most requested) - saves progress and allows instant rollback
    • Refreshed terminal interface
    • Native VS Code extension
    • Improved code interpreter capabilities

Claude Agent SDK

  • Developer infrastructure for building agentic applications
    • The building blocks used to create Claude Code now available to all developers

API Enhancements

  • Context editing feature and memory tool - enables agents to run longer and handle greater complexity
    • Code execution and file creation directly in conversations

Claude for Chrome

  • Browser extension now available to Max users (previously waitlist-only)

Developer & Industry Reception

Simon Willison's Review

Tech blogger Simon Willison called it "probably the best coding model in the world (at least for now)" and highlighted:

  • Exceptional performance with Claude Code Interpreter
    • Successfully completed complex multi-step tasks including:
    • Cloning GitHub repositories
  • - Installing dependencies from NPM and PyPI
    
  •   - Running test suites (466 tests passed in ~2 minutes 47 seconds)
    
  •     - Database schema modifications with tree-structured conversations
    
  •     - **Moves faster** than GPT-5-Codex
    
  •     - Competition expected soon from Google's rumored Gemini 3
    

Industry Context

  • Claude has become a favorite among developers and enterprises
    • Apple and Meta reportedly use Claude models internally
    • Anthropic has built significant business selling API access to coding applications (Cursor, Windsurf, Replit)
    • Recent competition: OpenAI's GPT-5 has challenged Claude's dominance on some coding benchmarks
    • Claude models praised for strong performance on software engineering tasks

Technical Details

Focus & Reliability

  • Can maintain focus on complex tasks for extended periods (30+ hours observed)
    • Better at handling multi-step workflows and maintaining context
    • Improved ability to self-correct and iterate on solutions

Alignment Improvements

  • Described as showing "large improvements across several areas of alignment compared to previous Claude models"
    • More helpful and less likely to refuse legitimate requests
    • Better calibrated safety responses

Competitive Landscape

Strengths

  • Leading on SWE-bench Verified among all models
    • Strong enterprise adoption
    • Comprehensive product ecosystem
    • Better pricing than Claude Opus while maintaining high performance

Challenges

  • OpenAI's GPT-5 is competitive on many benchmarks
    • Google's Gemini 3 rumored to be launching soon
    • Higher pricing than OpenAI's GPT-5 models

Bottom Line

Claude Sonnet 4.5 represents a significant advancement in AI coding capabilities, with reviewers praising its practical performance on real-world software engineering tasks. The simultaneous release of developer tools (Claude Code, Agent SDK) and the strong benchmark performance suggest Anthropic is making a serious push to cement Claude's position as the go-to model for software development and agentic applications.

However, with OpenAI and Google actively competing in this space, the "best coding model" title may be short-lived as the AI arms race continues to accelerate.


Sources

  1. Anthropic Official Announcement
    1. Simon Willison's Weblog Review
    1. TechCrunch Coverage

Summary compiled on September 29, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment