Gcore adds NVIDIA Dynamo to boost GPU efficiency and cut AI inference latency

Gcore adds NVIDIA Dynamo to boost GPU efficiency and cut AI inference latency

Edge AI solutions provider Gcore has integrated NVIDIA Dynamo into its AI inference solutions, offering up to 6x higher GPU throughput and 2x lower latency as a fully managed, one-click deployment. 

NVIDIA Dynamo is an open-source inference framework to optimize large generative AI and inference models in terms of GPU efficiency, memory bottlenecks, and data transfer problems.

Gcore offers a ready-to-use, fully-managed approach from models for popular inference ones, allowing deployment across public, private and hybrid or on-premises environments.

“Modern inference isn’t just ‘run a model’ – it’s batching, routing, dynamic workloads, longer contexts, and tight SLOs,” says Seva Vayner, product director of edge cloud and AI at Gcore. “In that reality, small scheduling and utilization losses become big performance and cost penalties. By integrating Dynamo as a managed service in Gcore, we bring advanced GPU optimization directly into the runtime path so customers see higher effective throughput and steadier tail latency, without operating the complexity themselves.”

With Dynamo, customers only need to activate it through the Gcore customer portal and do not have to handle complex GPU scheduling or routing. Dynamo-powered inference is now available on Gcore Inference and Everywhere AI. 

It enables better utilization of GPUs, which results in a cost-effective solution with an improved ROI by optimizing resource allocation and inter-node communication.

Gcore will be providing in-person demonstrations this month at MWC and GTC events..

Article Topics

 |   |   |   |   |   | 

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Featured Edge Computing Company

Latest News