Think Deeper, Act Faster with Qwen3 an Open-Weight Models with Advanced Features

Mia Cruz

Updated:

April 30, 2025

The Qwen Team has launched Qwen3, the latest in their large language model series, offering competitive performance in coding, math, and general tasks. With open-weight models, hybrid thinking modes, and support for 119 languages, Qwen3 aims to advance global AI research and development. This release enhances efficiency and versatility, empowering researchers and developers.

Open-Weight Models

Qwen3 includes two Mixture of Experts (MoE) models which are Qwen3-235B-A22B (235 billion total parameters, 22 billion activated) and Qwen3-30B-A3B (30 billion total, 3 billion activated) and six other dense models: Qwen3-32B, 14B, 8B, 4B, 1.7B, and 0.6B. All are open-weighted under the Apache 2.0 license. The Qwen3-30B-A3B outperforms QwQ-32B with ten times fewer activated parameters, and Qwen3-4B rivals Qwen2.5-72B-Instruct. Pre-trained and post-trained models, like Qwen3-30B-A3B and its base version, are available on Hugging Face, ModelScope, and Kaggle.

Hybrid Thinking Modes

Qwen3 features two problem-solving modes: Thinking Mode for step-by-step reasoning on complex tasks and Non-Thinking Mode for near-instant responses to simple queries. Users can switch modes using “/think” or “/no_think” prompts in multi-turn conversations, enabling stable thinking budget control. Performance scales with computational resources, balancing cost and inference quality for diverse tasks.

119 Languages Supported

Qwen3 supports 119 languages and dialects, including Indo-European (e.g., English, Russian), Sino-Tibetan (e.g., Chinese), Afro-Asiatic (e.g., Arabic), Austronesian (e.g., Indonesian), Dravidian (e.g., Tamil), Turkic (e.g., Turkish), and others like Japanese and Swahili. This multilingual capability supports global applications, from chatbots to research.

Robust Pretraining

Qwen3’s pretraining dataset spans 36 trillion tokens across 119 languages, nearly double Qwen2.5’s 18 trillion. It includes web content, PDF-extracted text via Qwen2.5-VL, and synthetic math/coding data from Qwen2.5-Math and Qwen2.5-Coder. The three-stage process involved:

Pretrained on ver 30 trillion tokens at 4K context length for basic language skills.
Pretrained on additional 5 trillion tokens focused on STEM, coding, and reasoning to ensures the model can handle longer inputs effectively.
High-quality long-context data to extend context to 32K tokens.
Qwen3 dense base models (e.g., 1.7B, 4B, 8B, 14B, 32B) match larger Qwen2.5 models (3B, 7B, 14B, 32B, 72B), often outperforming in STEM, coding, and reasoning. MoE models achieve similar performance with 10% active parameters, reducing costs.

Post-Training Pipeline

Qwen3’s four-stage post-training process enables hybrid capabilities:

Fine-tuning on chain-of-thought data for math, coding, reasoning, and STEM.
Reinforcement learning with rule-based rewards to boost exploration.
Fusion of thinking/non-thinking modes using chain-of-thought and instruction-tuning data.
Reinforcement learning on over 20 tasks, including instruction following and agent capabilities.
This ensures proficiency in reasoning, rapid responses, and agentic tasks like tool use.

Qwen3’s open-weight models, hybrid thinking, and multilingual support empower researchers and developers. Open-sourcing fosters collaboration, driving AI innovation. Explore Qwen3 on Qwen Chat or Hugging Face to see its potential.

Future Plans

Qwen3 is a step toward AGI and ASI, with plans to refine architectures, scale data, increase model size, extend context length, broaden modalities, and advance reinforcement learning for long-horizon reasoning. The focus will shift to training agents with environmental feedback.

Artificial IntelligenceBig Data

About the Author

Mia Cruz

Mia Cruz is an AI news correspondent from United States of America.