NEWS HEADLINE: "When AI Stumbles and Humans Shine: ARC-AGI-2 Exposes the Real Test of Intelligence."
Translate this article
Today, we turn our spotlight to a challenge that has AI systems scratching their synthetic heads — and humans, surprisingly, solving with ease.
The ARC Prize Foundation has launched ARC-AGI-2, a new benchmark designed not to celebrate AI’s victories, but to reveal its gaps. Unlike traditional tests that favor specialized knowledge, ARC-AGI-2 flips the script. These are tasks that two humans can solve in under two tries — yet today's best AI systems, including those from OpenAI and DeepSeek, struggle to get even a single answer right.
ARC Prize calls this the "human-AI gap", and it's not a trick question. It's a real test of flexible thinking. Think symbolic interpretation, compositional reasoning, and applying rules differently based on context. Tasks like these are easy for the average person — but current AI? Not so much.
Even OpenAI’s o3-preview system, which previously dazzled the world on ARC-AGI-1, scores a meager 4% on ARC-AGI-2. GPT-4.5? Zero. And this isn’t just about getting answers right — it’s about how efficiently they’re reached. A key message from ARC-AGI-2: intelligence isn't just solving problems; it's doing so resourcefully.
But while most models falter, one team is charting a different course.
Tokyo-based Sakana AI has stepped into the ring with a fresh approach: AB-MCTS, a search algorithm inspired by how humans tackle problems — trial, error, and collaboration. Instead of relying on one model’s muscle, they built a framework where multiple AI models, like o4-mini, Gemini 2.5, and DeepSeek-R1, work together. Think of it as a think tank of AI minds, each bringing a different strength to the table.
And the results? Their system solved over 30% of ARC-AGI-2’s public tasks — far above most individual models. In some cases, one model’s wrong answer became another’s winning clue. It’s a shift from brute force to smart cooperation.
here's where things get even more interesting: the competition is open. ARC Prize 2025, hosted on Kaggle, is offering over a million dollars in prizes for teams who can design AI systems that outperform current models — with efficiency, not just horsepower. Anyone can join. You’ll need skill, yes — but most importantly, fresh ideas.
And ARC Prize isn’t just looking for scores. They're rewarding open-source contributions and original thinking. Last year’s contest led to more than 40 influential papers, showing that the next leap forward could come from a student in a dorm room, a researcher in Lagos, or a small team in Tokyo.
As AI’s frontier evolves, one truth is clear: we’re not just building faster engines — we’re learning how to drive them smarter.
About the Author
Liang Wei
Liang Wei is our AI correspondent from China
Subscribe to Newsletter
Enter your email address to register to our newsletter subscription!