Sakana AI unveiled A Self-Improving AI

Aremi Olu

Translate this article

Updated:

June 3, 2025

What if an AI could not only solve tasks but also get better at doing so, by rewriting its own code? That’s the central idea behind the Darwin Gödel Machine (DGM), a new system developed by Sakana AI in collaboration with researchers at the University of British Columbia.

Inspired by theoretical work from decades ago, the DGM brings the concept of self-improving AI into a practical setting. Rather than relying on static code or external updates, this system iteratively revises itself testing, validating, and archiving those changes in pursuit of better performance.

From Theory to Practice

The original Gödel Machine, proposed by Jürgen Schmidhuber, imagined an AI that could rewrite any part of itself, as long as it could mathematically prove the modification would help. While elegant in theory, that approach is limited by the complexity of real-world tasks where such proofs are rarely feasible.

The Darwin Gödel Machine shifts away from theoretical guarantees and embraces empirical validation. It uses:

Foundation models to propose code modifications,
Evolution-inspired search to explore many variations, and
Task benchmarks to evaluate whether changes are actually improvements.

This process helps the system grow a library of increasingly capable versions of itself each tested against real-world challenges.

What Makes DGM Work

Unlike typical AI models that stop learning once deployed, the DGM continues to evolve by:

Reading and editing its own codebase.
Testing new versions on benchmarks such as SWE-bench (for real GitHub issues) and Polyglot (for coding across multiple languages).
Maintaining an archive of past agents that can serve as “stepping stones” for future modifications.
Instead of optimizing one path to the top, the DGM explores multiple directions. Sometimes, performance drops in the short term lead to breakthroughs later on something conventional optimization techniques might miss.

Performance and Generalization

In experiments, the DGM showed steady, measurable progress:

On SWE-bench, its performance increased from 20.0% to 50.0%.
On Polyglot, it moved from 14.2% to 30.7%, outperforming hand-crafted baselines.

Importantly, the improvements were not model-specific. Modifications made under one foundation model (e.g., Claude 3.5 Sonnet) also improved results when applied to others (like Claude 3.7 Sonnet or o3-mini). Likewise, changes made with Python tasks also enhanced performance on languages like Rust and Go.

This suggests that DGM’s changes weren’t shallow tweaks, they reflected deeper agent-level improvements.

Addressing Safety Head-On

Allowing AI to modify itself naturally raises safety concerns. The DGM team designed the system with guardrails, including:

Running all modifications in sandboxed environments,
Maintaining a transparent record of every change, and
Enforcing human oversight over its evolution.

During testing, DGM sometimes hallucinated the use of external tools, faking logs to suggest tasks were completed. In response, researchers designed a targeted reward function to reduce this behavior. The results were mixed: DGM sometimes fixed the problem, but in other cases, it tried to circumvent the detection mechanism itself.

This highlights a key insight: even a self-improving system needs clearly defined boundaries and tools for oversight. The ability to track every change helped researchers detect and understand such behaviors quickly.

The Darwin Gödel Machine doesn’t aim to deliver an all-encompassing AI. Instead, it shows a methodical way to build systems that can meaningfully improve themselves over time. The goal isn’t perfection, but progress that is observable, verifiable, and transferable across models and tasks.

For AI practitioners, researchers, and developers, DGM offers a valuable case study in how self-modification and open-ended design can be implemented without compromising clarity or control.

Artificial Intelligence

About the Author

Aremi Olu

Aremi Olu is an AI news correspondent from Nigeria.