DeepSeek R1 Model Upgraded with Enhanced Reasoning Capabilities

Jack Carter

Translate this article

Updated:

May 31, 2025

DeepSeek has released an update to its R1 model, now versioned as DeepSeek-R1-0528, with notable improvements in reasoning and performance across multiple benchmarks. This update focuses on strengthening the model’s ability to handle complex tasks in mathematics, programming, and general logic, positioning it as a strong contender among leading AI models.

Key Improvements in DeepSeek-R1-0528

The updated model leverages increased computational resources and optimized algorithms during post-training, resulting in significant performance gains. For example, in the AIME 2025 test, the model’s accuracy improved from 70% in the previous version to 87.5% in the current one. This enhancement is attributed to deeper reasoning processes, with the model now using an average of 23,000 tokens per question compared to 12,000 previously.

Beyond reasoning, the update reduces hallucination rates, improves function-calling capabilities, and enhances support for coding tasks. The model also supports a system prompt, eliminating the need for specific tags to initiate reasoning patterns.

Benchmark Performance

DeepSeek-R1-0528 demonstrates strong results across various benchmarks:

General Knowledge: Achieved 93.4% on MMLU-Redux (Exact Match) and 85.0% on MMLU-Pro, compared to 92.9% and 84.0% in the prior version.
Reasoning: Improved from 71.5% to 81.0% on GPQA-Diamond (Pass@1) and from 8.5% to 17.7% on Humanity’s Last Exam.
Coding: Increased from 63.5% to 73.3% on LiveCodeBench (Pass@1) and from 1530 to 1930 on Codeforces-Div1 (Rating).
Mathematics: Recorded 91.4% on AIME 2024 (up from 79.8%) and 79.4% on HMMT 2025 (up from 41.7%).
Tool Usage: Scored 37.0% on BFCL_v3_MultiTurn and 53.5% (Airline) and 63.9% (Retail) on Tau-Bench.

These metrics highlight the model’s improved ability to tackle diverse and complex tasks with greater accuracy.

Distilled Model: DeepSeek-R1-0528-Qwen3-8B

DeepSeek also introduced a distilled version, DeepSeek-R1-0528-Qwen3-8B, created by applying chain-of-thought techniques from DeepSeek-R1-0528 to Qwen3 8B Base. This model achieves top performance among open-source models on AIME 2024, scoring 86.0%—a 10% improvement over Qwen3 8B and comparable to the larger Qwen3-235B-thinking model. It shares the same tokenizer configuration as DeepSeek-R1-0528 and can be run similarly to Qwen3-8B.

Licensing and Accessibility

DeepSeek-R1-0528 and its distilled variant are licensed under the MIT License, supporting both commercial use and distillation. The model, with 685 billion parameters, is available in BF16, F8_E4M3, and F32 tensor types.

The updates to DeepSeek-R1-0528 reflect a meaningful step forward in AI model development, particularly in reasoning and task-specific performance. For developers, researchers, and businesses, these improvements offer a robust tool for tackling complex problems in coding, mathematics, and general knowledge tasks. The distilled Qwen3-8B model further expands access to high-performance AI for smaller-scale applications.

For more details, visit DeepSeek’s official website or refer to the research paper at arXiv

Artificial IntelligenceResearch and Innovation

About the Author

Jack Carter

Jack Carter is an AI Correspondent from United States of America.