Deploying AI Models at the Edge

High-density racks are straining data centers; liquid cooling may offer a solution

Categories Edge Computing News  |  Guest Posts
High-density racks are straining data centers; liquid cooling may offer a solution

By Fred Rebarber, Senior Technical Director, Thermal Solutions, Vertiv

Over the past several years, companies all over the world have recognized the power of what artificial intelligence and digital transformation can do for an organization’s automation, supply chain and e-commerce initiatives, and there are no signs of turning back.

Research firm International Data Corporation (IDC) predicts the AI market will grow 15.2% year over year in 2021, reaching around $341.8B in spending. The firm surveyed enterprise IT buyers and reported that AI and ML workloads for edge computing environments will grow rapidly through 2025. While this is an exciting time for tech, the widespread adoption of AI could have serious implications for an organization’s data center.

To keep up with this rise in AI and digital transformation initiatives, more high-density racks are being deployed in data centers. Much like high-performance computing (HPC) deployments, AI applications require the ability to process massive amounts of data with extremely low latency. But unlike HPC, that data isn’t just in the form of text and numbers. AI applications often process data from heterogeneous sources in multiple forms, including large image, audio, and video files.

Packing the necessary processing power into server racks to support these growing AI applications is cranking up the heat in data centers across the globe and raising sustainability and energy efficiency concerns for some organizations. The proliferation of high-density server racks to keep up with AI demands is straining the capacity of existing power systems, and the current methods in place may no longer suffice.

Data center operators experiencing rising rack densities or planning to deploy high-density racks are beginning to consider different approaches to improving energy efficiency. One alternative that has gained traction in recent years is bringing liquid cooling to the rack. Liquid cooling leverages the higher thermal transfer properties of water or other fluids to support efficient and cost-effective cooling of high-density racks. For organizations trying to lessen the power strain brought on by the AI and digital transformation boom, it may be time to consider using new alternatives like liquid cooling to ensure that your energy and sustainability goals are met.

Understanding the Limits of Traditional Cooling Methods

Air cooling systems have continually evolved to address higher densities with greater efficiency, but there is a point at which air may not have the thermal transfer properties required to provide sufficient cooling to high-density racks in an efficient manner. This can reduce the performance and reliability of specialized servers and becomes less energy efficient as rack power increases.

With more and more processing power being packed into servers supporting artificial intelligence and other processing-intensive applications, rack power requirements are exceeding 20 kilowatts (kW) in a growing number of facilities, and many organizations are now looking to deploy racks with requirements of 50 kW or more.

Simultaneously, server manufacturers are packing more CPUs and GPUs into each rack unit (U). With multiple high-performance servers in a rack, systems that deliver cooling air to racks may not be able to provide adequate cooling capacity, even with containment. In addition, the strategy of spreading compute loads out is not feasible in processing-intensive applications because of the latency challenges created by the physical distance that exists even within a single server. As a result, components are being compacted within devices, creating ultra-dense 1U servers that are driving rack densities to unprecedented levels.

Organizations that ignore these limitations will likely experience increased energy costs and reduced performance as CPUs and GPUs throttle back their clock speeds to prevent overheating. However, introducing liquid cooling to the data center can provide a better solution for addressing these energy issues.

Types of Liquid Cooling Technology

Liquid cooling technology available today has the capacity to efficiently and effectively cool racks of 50 kW and higher. This technology is available in a variety of configurations that use different technologies, including rear door heat exchangers, direct-to-chip cooling and immersion cooling.

Rear-door heat exchangers are a mature technology that doesn’t bring liquid directly to the server but does utilize the high thermal transfer properties of liquid. In a passive rear-door heat exchanger, a liquid-filled coil is installed in place of the rear door of the rack, and as server fans move heated air through the rack, the coil absorbs the heat before the air enters the data center. In an active design, fans integrated into the unit pull air through the coils for enhanced thermal performance.

In direct-to-chip liquid cooling, cold plates sit atop a server’s main heat-generating components to draw off heat through a single-phase or two-phase process. Single-phase cold plates use a cooling fluid looped into the cold plate to absorb heat from server components. In the two-phase process, a low-pressure dielectric liquid flows into evaporators, and the heat generated by server components boils the fluid. The heat is released from the evaporator as vapor and transferred outside the rack for heat rejection.

With immersion cooling, servers and other components in the rack are submerged in a thermally conductive dielectric liquid or fluid. In a single-phase immersion system, heat is transferred to the coolant through direct contact with server components and removed by heat exchangers outside the immersion tank. In two-phase immersion cooling, the dielectric fluid is engineered to have a specific boiling point that protects IT equipment but enables efficient heat removal. Heat from the servers changes the phase of the fluid, and the rising vapor is condensed back to liquid by coils located at the top of the tank.

Improved Efficiency and Beyond

Liquid cooling provides extremely efficient cooling since the cooling medium goes directly to the IT equipment rather than cooling the entire space. It can be up to 3,000 times more effective than using air, allowing the CPUs and GPUs in densely packed racks to operate continuously at their maximum voltage and clock frequency without overheating. This, combined with the reduction or elimination of fans required to move air across the data center and through servers, can create significant energy savings for liquid-cooled data centers. Additionally, the pumps required for liquid cooling consume less power than the fans needed to accomplish the same cooling.

If an organization wishes to deploy these high-density racks to support their increased digital transformation and AI initiatives, introducing liquid cooling to a data center can do much more for an organization than improving your facility’s energy efficiency. Below are several other ways a data center can benefit from transitioning to a liquid cooling system:

  • Improved Performance: A liquid cooling system will not only enable the desired reliability but also deliver IT performance benefits. As CPU case temperatures approach the maximum safe operating temperature, as is likely to occur with air cooling, CPU performance is throttled back to avoid thermal runaway.
  • Sustainability: Not only does liquid cooling create opportunities to reduce data center energy consumption and drive PUEs down to near 1.0, it provides a more effective approach for re-purposing captured heat to reduce the demand on building heating systems. The return-water temperature from the systems can be 140° F (60° C) or higher and the liquid-to-liquid heat transfer is more efficient than is possible with air-based systems.
  • Maximize Space Utilization: The density enabled by liquid cooling allows a facility to better use existing data center space, eliminating the need for expansions or new construction, or to build smaller-footprint facilities. It also enables processing-intensive edge applications to be supported where physical space is limited.
  • Lower Cost of Ownership: ASHRAE conducted a detailed cost of ownership analysis of air-cooled data centers versus a hybrid model air- and liquid-cooled data centers and found that, while a number of variables can influence TCO, “liquid cooling creates the possibility for improved TCO through higher density, increased use of free cooling, improved performance and improved performance per watt.”
  • Lower Sound Levels: Liquid cooling can allow for the reduction or elimination of fans, which dramatically reduces the noise levels in the datacenter. High-density air-cooled systems, when pushed to their limits, may require sound mitigation means and limit exposure time to unsafe noise levels, per the Occupational Safety and Health Administration (OSHA) noise level standards.

There are some data center operators that remain skeptical of liquid cooling because of the perceived risks of bringing liquid to the rack, but current-generation liquid cooling technologies can be deployed to minimize both the risks of leaks and the potential consequences of any leaks that do occur. If operators integrate risk mitigation into every step of the system design process (fluid selection, distribution methods, leak detection installation, etc.), the risks associated with liquid cooling are far outweighed by the benefits this system provides.

In the future, organizations looking to efficiently deploy extremely high-density racks (> 30 kW) in their data centers should begin to consider other alternatives such as liquid cooling. But organizations must keep in mind that this requires careful planning and focused expertise, so it’s important to find the right partner to ensure the transition to liquid cooling is a success.

To learn more, read our white paper, “Understanding Data Center Liquid Cooling Options and Infrastructure Requirements.”

About the Author

Fred Rebarber is senior technical director of Thermal Solutions for Vertiv.

DISCLAIMER: Guest posts are submitted content. The views expressed in this post are that of the author, and don’t necessarily reflect the views of Edge Industry Review (EdgeIR.com).

Article Topics

 |   |   |   |   | 

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Featured Edge Computing Company

Edge Ecosystem Videos

Automating the Edge

“Automating

Deploying AI Models at the Edge

“Deploying

Latest News