Microsoft Research Introduces Rho-Alpha, an AI Model for Robotic Manipulation
Translate this article
Microsoft Research has announced Rho-alpha (ρα), a new artificial intelligence model designed to translate natural language commands into control signals for robotic systems. Described as a vision-language-action (VLA+) model, it expands on typical capabilities by incorporating additional sensory data, starting with tactile sensing.
The model is derived from Microsoft’s Phi series of vision-language models and is focused on bimanual manipulation tasks—coordinating two robotic arms. According to the announcement, the goal is to create physical systems that are more adaptable to dynamic situations and human preferences.
How It Was Developed and Trained
To address the scarcity of large-scale robotics training data, Microsoft’s approach heavily utilizes simulation. The team generates synthetic training data using reinforcement learning within the NVIDIA Isaac Sim framework, which is then combined with available physical demonstration datasets. This method aims to teach the model tasks that are difficult to capture solely through real-world teleoperation.
Demonstrated Capabilities
The announcement included video demonstrations of Rho-alpha controlling robots. In one setup, a dual-armed robot equipped with tactile sensors performed tasks like inserting a power plug into an outlet and packing a toolbox based on verbal instructions. The footage also showed the system interacting with a “BusyBox,” a physical benchmark device, responding to commands like “push the green button” or “turn the knob to position 5.”
Microsoft notes the model is currently under evaluation on dual-arm setups and humanoid robots. A detailed technical report is expected in the coming months.
Availability
Microsoft is launching a Research Early Access Program for organizations interested in evaluating Rho-alpha for their specific robots and use cases. The model is also planned for future release via Microsoft Foundry. The initiative is part of Microsoft’s broader investment in “Physical AI,” where agentic AI meets physical systems.
About the Author

Aremi Olu
Aremi Olu is an AI news correspondent from Nigeria.
Recent Articles
Subscribe to Newsletter
Enter your email address to register to our newsletter subscription!