Blogs
March 19, 2025
16:00

Nvidia CEO Jensen Huang unveils Dynamo, an open-source inference framework for AI inferencing, at GTC 2025

AI chipmaker Nvidia on Tuesday (March 18, 205) unveiled Dynamo, an open-source inference framework designed to enhance the deployment of generative AI and reasoning models across large-scale, distributed environments.

Announced at the GTC 2025 conference, Dynamo aims to significantly boost inference performance while reducing operational costs for AI applications.

Dynamo brings some cool new tricks to make AI inference faster by breaking up the prefill and decoding stages, so each GPU can do more work at once. They also use dynamic scheduling to make sure GPUs are used efficiently, and they have optimised how data is transferred between GPUs to make responses quicker. Plus, they have moved the KV cache around to make the system even faster.

In practical applications, Nvidia claims, Dynamo has demonstrated substantial performance improvements. For instance, when serving the open-source DeepSeek-R1 671B reasoning model on NVIDIA’s GB200 NVL72 platform, Dynamo increased the number of requests served by up to 30 times. This enhancement positions Dynamo as a cost-effective solution for AI firms aiming to maximise token revenue generation.

The framework supports major AI inference backends, including PyTorch, SGLang, NVIDIA TensorRT-LLM, and vLLM, providing developers and AI researchers with the flexibility to integrate Dynamo into diverse AI workflows.

For enterprises seeking accelerated deployment and enterprise-grade support, Nvidia plans to include Dynamo with its NIM microservices, which are part of the NVIDIA AI Enterprise suite. This integration is expected to facilitate faster time to production while ensuring security and stability in AI operations.

Published - March 19, 2025 12:44 pm IST