Checkpoints are live. That changes the conversation.
When NVIDIA announced Cosmos 3 on June 3, the story was architectural: a Mixture-of-Transformers model that integrates vision-language reasoning, image generation, audio-visual generation, robot policy, forward dynamics, and inverse dynamics into a single omnimodal world model. No chaining multiple specialized models. One forward pass across modalities. The NVIDIA Research page confirms this directly, “Cosmos 3 connects understanding, generation, simulation, and action through a shared omnimodal world model that moves fluidly across text, images, video, audio, and actions.”
Now the weights are downloadable. The code is at github.com/nvidia/cosmos. The collection is on Hugging Face. That’s the difference between a research announcement and a deployment decision.
Cosmos 3 ships in two sizes. The Super configuration runs 64 billion total parameters, a 32B Reasoner and 32B Generator working in tandem. The Nano is 8 billion parameters, sized for workstation-grade hardware. For robotics teams running inference on edge nodes, Nano is the relevant variant. For teams building simulation pipelines or training policy models in the cloud, Super is the configuration to evaluate.
Architecture approach: Cosmos 3 MoT vs. prior pipeline
The licensing matters and requires careful reading. NVIDIA describes the release as using the OpenMDW-1.1 License, which it characterizes as a Linux Foundation framework designed to permit commercial customization. That framing is plausible, the Linux Foundation’s Open Model and Dataset Weights initiative is a known licensing framework. But the specific terms haven’t been confirmed from independent documentation in this reporting cycle. Verify the license directly before building commercial workflows on top of Cosmos 3 weights.
The part nobody mentions in the announcement materials: NVIDIA’s performance rankings (first among open-source models on Artificial Analysis for text-to-image and image-to-video, first on RoboArena for policy benchmarks) are vendor-stated claims from the technical report. Independent evaluation hasn’t been published. Don’t make deployment decisions based on those rankings until a third party reproduces them. The model’s architecture is confirmed and credible; the benchmark position is not.
Where this lands in the competitive picture: Google’s Gemma 4 12B local agentic stack (covered by TJS earlier this week) targets a different problem, local, on-device agentic workflows with language capability. Cosmos 3 targets physical world understanding and robot policy. These aren’t competing for the same use case. But enterprise teams evaluating “open-source AI infrastructure” as a category need to distinguish between them clearly.
What to Watch
What to watch
The first independent benchmark reproduction of Cosmos 3 on Artificial Analysis or a comparable platform will either confirm or complicate NVIDIA’s rankings claim. That’s the inflection point. For robotics teams, the RoboArena position matters most – watch for third-party policy model evaluations over the next four to eight weeks.
The TJS read: Cosmos 3 is now deployable, not just announced. If your team works in robotics, simulation, or multi-modal physical AI, download the checkpoints and run your own task-specific evaluation against Nano first. The license framing suggests commercial use is intended, but confirm the terms yourself before committing a production pipeline to the weights. Independent benchmarks aren’t here yet. Build a proof of concept; don’t build infrastructure.