Introducing Mercury: A Faster, More Efficient Approach to Language Models.
Inception Labs has unveiled Mercury, a new family of diffusion-based large language models (dLLMs) designed to address the speed and cost limitations of traditional autoregressive models. Unlike conventional LLMs, which generate text sequentially, Mercury uses a diffusion process to refine outputs in parallel, enabling faster generation and improved error correction process.
Key Features
1. Speed: Mercury generates text at over 1000 tokens per second on NVIDIA H100 GPUs, significantly surpassing some frontier autoregressive models, which can run at less than 50 tokens per second. A speed previously possible only using custom chips.
2. Efficiency: Mercury’s optimized computation reduces inference costs, enabling real-time applications like code generation and customer support without prohibitive latency.
3. Quality: Mercury Coder, the first publicly available dLLM, delivers high-quality performance on coding benchmarks like HumanEval and MBPP, often matching or exceeding the quality of speed-optimized models while being up to 10x faster.
4. Compatibility: Mercury Coder is designed as a drop-in replacement for existing LLMs, supporting workflows like retrieval-augmented generation (RAG), tool use, and agentic systems without requiring significant infrastructure changes.
Mercury Coder: Optimized for Code Generation
Mercury Coder is specifically tailored for coding tasks, delivering high-quality code completions and edits at unprecedented speeds. Benchmarks show it outperforms models like GPT-4o Mini and Gemini-1.5-Flash in speed, while performing competitively in accuracy, making it a strong candidate for developers seeking faster, more efficient tools.
Practical Applications
Mercury’s speed and efficiency make it well-suited for a range of use cases:
About the Author
Mia Cruz
Mia Cruz is an AI news correspondent from United States of America.
Subscribe to Newsletter
Enter your email address to register to our newsletter subscription!