INTELLECT-2: A 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning
Translate this article
Prime Intellect has released INTELLECT-2, a 32-billion parameter language model trained using a novel decentralized approach to reinforcement learning (RL). Unlike traditional RL training, which relies on centralized GPU clusters, INTELLECT-2 was developed through asynchronous RL across a global network of permissionless compute contributors. This release demonstrates the feasibility of distributed RL for large language models (LLMs).
Training Infrastructure
To enable this distributed training, Prime Intellect developed several open-source components:
These components enable INTELLECT-2’s training across a dynamic, global compute network.
Training Approach
INTELLECT-2’s training used 285,000 verifiable tasks focused on mathematics and coding, sourced from NuminaMath-1.5, Deepscaler, and SYNTHETIC-1. The training recipe included:
Prime Intellect conducted two experiments: TARGET-SHORT, optimizing for efficient reasoning with shorter target lengths, and TARGET-LONG, the primary run with longer targets. Both experiments overlapped communication and computation, with the model improving task rewards on mathematics and coding problems, though length penalty reductions were slower than in preliminary tests.
Performance and Limitations
INTELLECT-2, built on the QwQ-32B model, achieved modest performance improvements on mathematics and coding benchmarks. However, as QwQ-32B was already extensively trained with RL, broad generalized gains were limited. Further improvements may require higher-quality datasets or stronger base models like Qwen3.
Open-Source Contributions
Prime Intellect has open-sourced INTELLECT-2, its code, and data to support research in decentralized training. The model is available on Hugging Face, with a chat interface at chat.primeintellect.ai and a technical report at primeintellect.ai/intellect-2.
Future Directions
Prime Intellect plans to enhance INTELLECT-2 by increasing inference compute, integrating tools like web search and Python interpreters for multi-turn RL, crowdsourcing RL tasks, and exploring model merging via techniques like DiLoCo. These efforts aim to advance open-source, decentralized AI.
About the Author
omar ali
Recent Articles
Subscribe to Newsletter
Enter your email address to register to our newsletter subscription!