Connect

Google Research Introduces \"Nested Learning,\" A New Paradigm to Overcome Catastrophic Forgetting in AI

Google Research Introduces \"Nested Learning,\" A New Paradigm to Overcome Catastrophic Forgetting in AI

Leo Silva

Translate this article

Updated:
November 11, 2025

Google Research has unveiled a groundbreaking machine learning framework, "Nested Learning," which rethinks the fundamental structure of AI models to tackle one of the field's most persistent challenges known as catastrophic forgetting. Published in a new paper at NeurIPS 2025, this paradigm proposes viewing a single model not as a monolithic entity, but as a system of smaller, interconnected optimization problems that learn simultaneously at different rates.


The new approach seeks to bridge the long-standing separation between a model's architecture and its training algorithm, arguing they are fundamentally the same concept viewed at different "levels." This fresh perspective opens a new dimension for designing AI that can learn continuously and efficiently, much like the human brain.

The Core Challenge: Moving Beyond a Static Knowledge Cap

Current large language models (LLMs) are limited by their static knowledge, confined to what was learned during pre-training or the immediate context of a conversation. The simple act of updating a model with new data often results in catastrophic forgetting, where new knowledge overwrites old skills. Traditional solutions have involved architectural tweaks or new optimization rules, but these have treated the model's design and its training as separate concerns.

Nested Learning unifies these elements. It reveals that complex models are actually a set of nested or parallel optimization problems, each with its own "context flow" and, crucially, its own update frequency rate. This multi-timescale update system is inspired by the neuroplasticity of the human brain, which adapts continuously through changes at different levels and speeds.

From Theory to Practice: Deep Optimizers and Continuum Memory

The Nested Learning paradigm provides principled ways to enhance existing AI components:

  1. Deep Optimizers: By viewing optimizers as associative memory modules, the team derived new, more resilient formulations for core concepts like momentum, making them less susceptible to noisy or imperfect data.
  2. Continuum Memory Systems (CMS): This concept extends the Transformer's short-term (attention) and long-term (feedforward networks) memory into a full spectrum of memory modules. Each module in the continuum updates at a specific frequency, creating a far richer and more effective memory system for continual learning.

Hope: A Proof-of-Concept Architecture with "Infinite" Learning Levels

To validate their theory, the researchers built "Hope," a self-modifying, recurrent architecture based on the Titans framework. Hope incorporates Continuum Memory Systems and can leverage unbounded levels of in-context learning. Crucially, it can optimize its own memory through a self-referential process, creating a architecture capable of layered, continuous self-improvement.

Experimental Results Demonstrate Superior Performance

Experiments conducted on language modeling and reasoning tasks confirm the effectiveness of the Nested Learning approach. The Hope architecture demonstrated compelling advantages:

  1. Enhanced Language Modeling & Reasoning: Hope achieved lower perplexity and higher accuracy on common-sense reasoning tasks compared to modern recurrent models and standard Transformers.
  2. Superior Long-Context Handling: In challenging "Needle-in-a-Haystack" tasks, Hope and its Titans predecessor consistently outperformed other models like Mamba2 and TTT, proving the efficiency of its continuum memory system for managing extended information sequences.

A Step Toward Truly Continual Learning

The introduction of Nested Learning represents a significant conceptual shift in machine learning. By providing a unified view of architecture and optimization, it offers a robust new framework for designing AI that can learn and adapt over time without sacrificing past knowledge. This research marks a critical step toward closing the gap between the static nature of current AI and the dynamic, continual learning capabilities of biological intelligence.


About the Author

Leo Silva

Leo Silva

Leo Silva is an Air correspondent from Brazil.

Recent Articles

Subscribe to Newsletter

Enter your email address to register to our newsletter subscription!

Contact

+1 336-825-0330

Connect