Anthropic Unveils Claude Sonnet 4.5: AI Coding and Reasoning King

Eva Rossi

Translate this article

Updated:

October 1, 2025

Finally, Anthropic announced the release of Claude Sonnet 4.5, positioning it as their most advanced model yet for coding, building complex AI agents, and computer interaction. The update emphasizes practical applications in modern work, where code underpins tools like applications and spreadsheets. Alongside the model, Anthropic rolled out enhancements to their products and introduced developer tools to empower users.

Key Capabilities and Benchmarks

Claude Sonnet 4.5 excels in real-world software coding, leading the SWE-bench Verified evaluation with a score of 77.2% (averaged over 10 trials with a 200K thinking budget), and up to 82.0% with high-compute optimizations. It maintains focus on complex tasks for over 30 hours.

In computer use, it tops the OSWorld benchmark at 61.4%, a significant jump from Sonnet 4's 42.2% just four months prior. This powers features like the Claude for Chrome extension, which enables browser-based tasks such as site navigation and spreadsheet filling.

The model also advances in broader evaluations:

Reasoning and Math: Improvements on benchmarks like AIME (math) and τ2-bench (agent tasks)

Domain Expertise: Experts in finance, law, medicine, and STEM report stronger knowledge and reasoning compared to prior models like Opus 4.1.
Other Benchmarks: Leading or competitive scores on Terminal-Bench, MMMLU (multilingual), and Finance Agent evaluations.

Early customer feedback highlights its effectiveness in handling intricate, multi-step workflows.

Enhanced Alignment and Safety

Anthropic describes Sonnet 4.5 as their most aligned frontier model, with reduced misaligned behaviors such as sycophancy, deception, power-seeking, and encouraging delusions. It includes defenses against prompt injection attacks, especially for agentic and computer-use features.

Released under AI Safety Level 3 (ASL-3), it incorporates classifiers to detect risks related to chemical, biological, radiological, and nuclear (CBRN) weapons. These may occasionally flag benign content, but false positives have dropped tenfold since initial implementation and twofold since Opus 4's May release. Users can switch to Sonnet 4 for lower-risk scenarios, and certain industries can join an allowlist.

Detailed safety evaluations, including mechanistic interpretability techniques, are available in the model's system card.q

Product Upgrades and Developer Tools

The release includes updates across Anthropic's ecosystem:

Claude Code: New checkpoints for saving and reverting progress, a refreshed terminal interface, and a native VS Code extension.
Claude API: Context editing and memory features for longer, more complex agent runs.
Claude Apps: Integrated code execution and file creation (e.g., spreadsheets, slides, documents).
Claude for Chrome Extension: Now available to Max users from the waitlist.

Anthropic is open-sourcing their infrastructure via the Claude Agent SDK, enabling developers to build custom agents. This SDK, based on tech powering Claude Code, handles memory management, permissions, and subagent coordination for diverse tasks beyond coding.

Additionally, a five-day research preview called "Imagine with Claude" lets Max subscribers experiment with real-time software generation on claude.ai/imagine.

For more details, check Anthropic's system card, model page, documentation, engineering posts, and cybersecurity research.

About the Author

Eva Rossi

Eva Rossi is an AI news correspondent from Italy.