California-based company Rafay Systems has announced an update of its turnkey solution for operating Kubernetes clusters with the addition of GPU support at scale.
The novel upgrade also introduces additional metrics and dashboards designed to provide deeper visibility into GPU health and performance.
The rationale behind the Kubernetes Operations Platform (KOP)’s new capabilities lies in supporting firms facing challenges of visibility and monitoring, which can cause delays in application deployment and wasted costs associated with idle or underperforming GPUs in the clusters.
The platform also eliminates the need for GPU-enabled workloads being developed and maintained by external entities.
With KOP’s additional metrics and dashboards, developers and operations teams can now monitor, operate, and improve performance for GPU-based container workloads from a centralized interface and in-house.
The KOP’s new dashboard. Image credit: Rafay Systems.
According to Rafay Systems, the collection of GPU metrics is entirely automated, and access to the data is managed via dedicated corporate single sign-on (SSO) systems with role-based rules.
“Rafay makes spinning up GPU-enabled Kubernetes clusters incredibly simple. In just a few steps an enterprise’s deep learning and inference projects can be fully operational,” said Mohan Atreya, SVP Product and Solutions at Rafay Systems.
“Not only do we provide the fastest path to powering environments for AI and machine learning applications, but the combination of capabilities in Rafay KOP enables scalable edge/remote use cases with support for zero-trust access, policy management, GPU monitoring, and more across an entire fleet of thousands of clusters.”
The KOP upgrade comes months after Rafay Systems raised $25 million in a Series B funding round led by ForgePoint Capital.
More recently, the company announced it had doubled total annual recurring revenue since the investment.
Rafay’s announcement highlights the confluence of two trends: Kubernetes use at the edge and GPUs for AI. Enterprise developers are mostly using centralized private and public cloud deployments at present time for Kubernetes, and this happens to be where plenty of powerful GPUs are also available from cloud providers.
A recent study showed that Kubernetes will be the platform of choice for running AI and ML workloads in the next two years. Separate research has revealed that AI and ML are one of the most frequently cited workloads the enterprises want to perform at the edge closest to their data source.
Nvidia, meanwhile, has been pushing development of GPUs for use in edge AI applications even as numerous startups are developing novel chip architectures for edge AI. Adding GPU support into Rafay’s managed Kubernetes platform highlights the pathway for Kubernetes and AI move from the data center to the edge, in our view.
Jim Davis, editor, EdgeIR.com
edge AI | edge orchestration | GPU | Kubernetes | managed service | Nvidia | Rafay Systems