NEWS HEADLINE: "When AI Stumbles and Humans Shine: ARC-AGI-2 Exposes the Real Test of Intelligence."

Liang Wei

Translate this article

Updated:

July 5, 2025

Today, we turn our spotlight to a challenge that has AI systems scratching their synthetic heads — and humans, surprisingly, solving with ease.

The ARC Prize Foundation has launched ARC-AGI-2, a new benchmark designed not to celebrate AI’s victories, but to reveal its gaps. Unlike traditional tests that favor specialized knowledge, ARC-AGI-2 flips the script. These are tasks that two humans can solve in under two tries — yet today's best AI systems, including those from OpenAI and DeepSeek, struggle to get even a single answer right.

ARC Prize calls this the "human-AI gap", and it's not a trick question. It's a real test of flexible thinking. Think symbolic interpretation, compositional reasoning, and applying rules differently based on context. Tasks like these are easy for the average person — but current AI? Not so much.

Even OpenAI’s o3-preview system, which previously dazzled the world on ARC-AGI-1, scores a meager 4% on ARC-AGI-2. GPT-4.5? Zero. And this isn’t just about getting answers right — it’s about how efficiently they’re reached. A key message from ARC-AGI-2: intelligence isn't just solving problems; it's doing so resourcefully.

But while most models falter, one team is charting a different course.

Tokyo-based Sakana AI has stepped into the ring with a fresh approach: AB-MCTS, a search algorithm inspired by how humans tackle problems — trial, error, and collaboration. Instead of relying on one model’s muscle, they built a framework where multiple AI models, like o4-mini, Gemini 2.5, and DeepSeek-R1, work together. Think of it as a think tank of AI minds, each bringing a different strength to the table.

And the results? Their system solved over 30% of ARC-AGI-2’s public tasks — far above most individual models. In some cases, one model’s wrong answer became another’s winning clue. It’s a shift from brute force to smart cooperation.

here's where things get even more interesting: the competition is open. ARC Prize 2025, hosted on Kaggle, is offering over a million dollars in prizes for teams who can design AI systems that outperform current models — with efficiency, not just horsepower. Anyone can join. You’ll need skill, yes — but most importantly, fresh ideas.

And ARC Prize isn’t just looking for scores. They're rewarding open-source contributions and original thinking. Last year’s contest led to more than 40 influential papers, showing that the next leap forward could come from a student in a dorm room, a researcher in Lagos, or a small team in Tokyo.

As AI’s frontier evolves, one truth is clear: we’re not just building faster engines — we’re learning how to drive them smarter.

Artificial Intelligence

About the Author

Liang Wei

Liang Wei is our AI correspondent from China