Nvidia is determined to integrate AI into the real world with new models unveiled at CES 2026. The company introduced Cosmos Reason 2, an updated vision-language model for embodied reasoning, alongside an expanded Nemotron family that aims to supercharge robotic intelligence.
Nvidia’s Cosmos Reason 2 aims to bring reasoning VLMs into the physical world
Key Takeaways:
- Nvidia introduced Cosmos Reason 2 to enable AI agents to plan real-world actions.
- The company’s fresh slate of models goes beyond chat-based interfaces.
- New Nemotron models focus on speech, data retrieval, and safety features.
- Nvidia emphasizes open, customizable AI ecosystems.
- The roadmap envisions full integration of digital and physical AI.
The Age of Physical AI
Nvidia CEO Jensen Huang declared last year that “we are now entering the age of physical AI.” Building on this vision, the company has shifted from simpler language models to solutions that can reason and operate in the real world. At CES 2026, Nvidia announced a new set of products to deepen AI’s ability to function in physical spaces, from robotic assistance to agentic decision-making in factories or warehouses.
Cosmos Reason 2 and Its Embodied Capabilities
The centerpiece of Nvidia’s latest offerings is Cosmos Reason 2, an advanced vision-language model designed for embodied reasoning. Its predecessor, Cosmos Reason 1, introduced a two-dimensional ontology that propelled it to the top of Hugging Face’s physical reasoning for video leaderboard. Building on that success, Cosmos Reason 2 integrates enhanced customization for businesses and empowers robots or other AI agents to plan their next steps in unpredictable environments.
Kari Briski, Nvidia’s vice president for generative AI software, explains: “These new robots combine broad fundamental knowledge with deep proficiency and complex tasks.” She emphasizes that Cosmos Reason 2 “enhances the reasoning capabilities that robots need to navigate the unpredictable physical world.”
Nemotron Family Grows
In addition to the Cosmos updates, Nvidia unveiled three major additions to its Nemotron lineup: Nemotron Speech, Nemotron RAG, and Nemotron Safety. According to the company, Nemotron Speech provides “real-time low-latency speech recognition” and is 10 times faster than other models in its category. Meanwhile, Nemotron RAG addresses multimodal insights by combining an embedding model and a rerank model to handle text and images efficiently. Nemotron Safety is engineered to detect sensitive data, preventing accidental disclosure of personally identifiable information.
These developments build on Nemotron 3, the agentic reasoning model that debuted in December. The expanded family highlights Nvidia’s ambition to develop AI tools that think, speak, and function responsibly in the real world.
Building an Open AI Ecosystem
Nvidia underscores its open approach by providing models, datasets, training scripts, and example blueprints to encourage customization. “In building specialized AI agents, a digital workforce, or the physical embodiment of AI in robots and autonomous vehicles, more than just the model is needed,” Briski said. She points out how a complete open ecosystem—from compute resources to large-scale data—ensures that developers can tailor each AI system to their specific applications.
Looking Ahead
With this suite of models, Nvidia envisions a shared enterprise ecosystem that unites digital and physical agents. Cosmos Reason 2, Nemotron Speech, Nemotron RAG, and Nemotron Safety all aim to meet the real-time demands of next-generation robotics and agentic AI. By integrating vision, language, speech, and safety checks into a single framework, Nvidia positions itself at the forefront of a new era where AI no longer lives only in code—it thrives amid the challenges of the physical world.