Introducing Stable Virtual Camera: Multi-View Video Generation with 3D Camera Control.

Liang Wei

Updated:

March 20, 2025

Stability.AI announce the release of Stable Virtual Camera, a multi-view diffusion model that transforms 2D images into immersive 3D videos with realistic depth and perspective without complex reconstruction or scene-specific optimization. Stable Virtual Camera combines the familiar control of traditional virtual cameras with the power of generative AI to deliver more precise, intuitive control over 3D video outputs.

Compared to Traditional 3D models, Stable Virtual Camera generates novel views of a scene from one or more input images at user specified camera angles delivering seamless trajectory videos across dynamic camera paths.

Capabilities

Stable Virtual Camera offers advanced capabilities for generating 3D videos, including:

Dynamic Camera Control: Offers customizable camera trajectories with various dynamic paths, including 360°, Lemniscate (∞ shape), Spiral, Dolly Zoom In, Dolly Zoom Out, Zoom In, Zoom Out, Move Forward, Move Backward, Pan Up, Pan Down, Pan Left, Pan Right, and Roll.
Flexible Inputs: Creates 3D videos from a single input image or up to 32 images.
Multiple Aspect Ratios: Supports video production in square (1:1), portrait (9:16), landscape (16:9), and other custom aspect ratios, all without the need for extra training.
Long Video Generation: Maintains 3D consistency in videos up to 1,000 frames, ensuring smooth loops and seamless transitions, even when revisiting the same perspectives.

Research and Model Architecture

Stable Virtual Camera has achieved state-of-the-art results in novel view synthesis (NVS) benchmarks, outperforming models like ViewCrafter and CAT3D. Its success lies in its multi-view diffusion model architecture, which is trained with a fixed sequence length but can accommodate variable input and output lengths during sampling. This is achieved through a two-pass procedural sampling process, ensuring smooth and consistent results.

Limitations

Stable Virtual Camera is at the initial stage, which means it may produce lower-quality results in certain scenarios. Degraded outputs may result from images featuring humans, animals, or dynamic textures like water.

Availability

Stable Virtual Camera is available for research use under a Non-Commercial License. You can read the download the weights on Hugging Face (https://huggingface.co/stabilityai/stable-virtual-camera), and access the code on GitHub (https://github.com/Stability-AI/stable-virtual-camera).

About the Author

Liang Wei