Anthropic Unveils Claude Sonnet 4.5: AI Coding and Reasoning King
Translate this article
Finally, Anthropic announced the release of Claude Sonnet 4.5, positioning it as their most advanced model yet for coding, building complex AI agents, and computer interaction. The update emphasizes practical applications in modern work, where code underpins tools like applications and spreadsheets. Alongside the model, Anthropic rolled out enhancements to their products and introduced developer tools to empower users.
Key Capabilities and Benchmarks
Claude Sonnet 4.5 excels in real-world software coding, leading the SWE-bench Verified evaluation with a score of 77.2% (averaged over 10 trials with a 200K thinking budget), and up to 82.0% with high-compute optimizations. It maintains focus on complex tasks for over 30 hours.
In computer use, it tops the OSWorld benchmark at 61.4%, a significant jump from Sonnet 4's 42.2% just four months prior. This powers features like the Claude for Chrome extension, which enables browser-based tasks such as site navigation and spreadsheet filling.
The model also advances in broader evaluations:
Reasoning and Math: Improvements on benchmarks like AIME (math) and τ2-bench (agent tasks)
Early customer feedback highlights its effectiveness in handling intricate, multi-step workflows.
Enhanced Alignment and Safety
Anthropic describes Sonnet 4.5 as their most aligned frontier model, with reduced misaligned behaviors such as sycophancy, deception, power-seeking, and encouraging delusions. It includes defenses against prompt injection attacks, especially for agentic and computer-use features.
Released under AI Safety Level 3 (ASL-3), it incorporates classifiers to detect risks related to chemical, biological, radiological, and nuclear (CBRN) weapons. These may occasionally flag benign content, but false positives have dropped tenfold since initial implementation and twofold since Opus 4's May release. Users can switch to Sonnet 4 for lower-risk scenarios, and certain industries can join an allowlist.
Detailed safety evaluations, including mechanistic interpretability techniques, are available in the model's system card.q
Product Upgrades and Developer Tools
The release includes updates across Anthropic's ecosystem:
Anthropic is open-sourcing their infrastructure via the Claude Agent SDK, enabling developers to build custom agents. This SDK, based on tech powering Claude Code, handles memory management, permissions, and subagent coordination for diverse tasks beyond coding.
Additionally, a five-day research preview called "Imagine with Claude" lets Max subscribers experiment with real-time software generation on claude.ai/imagine.
For more details, check Anthropic's system card, model page, documentation, engineering posts, and cybersecurity research.
About the Author
Eva Rossi
Eva Rossi is an AI news correspondent from Italy.
Recent Articles
Subscribe to Newsletter
Enter your email address to register to our newsletter subscription!