Gcore adds NVIDIA Dynamo to boost GPU efficiency and cut AI inference latency

Mar 11, 2026 | Stephen Mayhew

Categories Digital Infrastructure News | Technology & Architecture

Gcore adds NVIDIA Dynamo to boost GPU efficiency and cut AI inference latency

Edge AI solutions provider Gcore has integrated NVIDIA Dynamo into its AI inference solutions, offering up to 6x higher GPU throughput and 2x lower latency as a fully managed, one-click deployment.

NVIDIA Dynamo is an open-source inference framework to optimize large generative AI and inference models in terms of GPU efficiency, memory bottlenecks, and data transfer problems.

Gcore offers a ready-to-use, fully-managed approach from models for popular inference ones, allowing deployment across public, private and hybrid or on-premises environments.

“Modern inference isn’t just ‘run a model’ – it’s batching, routing, dynamic workloads, longer contexts, and tight SLOs,” says Seva Vayner, product director of edge cloud and AI at Gcore. “In that reality, small scheduling and utilization losses become big performance and cost penalties. By integrating Dynamo as a managed service in Gcore, we bring advanced GPU optimization directly into the runtime path so customers see higher effective throughput and steadier tail latency, without operating the complexity themselves.”

With Dynamo, customers only need to activate it through the Gcore customer portal and do not have to handle complex GPU scheduling or routing. Dynamo-powered inference is now available on Gcore Inference and Everywhere AI.

It enables better utilization of GPUs, which results in a cost-effective solution with an improved ROI by optimizing resource allocation and inter-node communication.

Gcore will be providing in-person demonstrations this month at MWC and GTC events..

Previous Article:
AMD and Meta align roadmaps in 6GW AI infrastructure deal

Next Article:
Why the future of AI inference lies at the edge

Article Topics

Comments

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Gcore adds NVIDIA Dynamo to boost GPU efficiency and cut AI inference latency

Related

Article Topics

Comments

Leave a Reply Cancel reply

Digital Infrastructure News

DigitalOcean buys Katanemo Labs to build out agentic AI runtime layer beyond GPUs

Meta commits $21B to CoreWeave, signaling neoclouds as core AI infrastructure layer

Micron backs SiMa.ai to tighten compute-memory integration for power-constrained edge AI systems

Featured Company

Sponsored Links

Digital Infrastructure Events

Latest News