Microsoft is testing a cooling technology known as submersion cooling that could make it more efficient to run high-performance, low latency workloads such as machine learning and artificial intelligence applications at the edge. Microsoft is not the first company to experiment with submersion cooling but is claiming to be the first to use two-phase immersion cooling in a production cloud service environment.
Using liquids to cools computers has a history that goes back to the first IBM mainframes. Submersion cooling is a technique that has been pioneered in the world of supercomputers by companies like Cray Supercomputer and has found traction in applications such as cryptocurrency mining. Servers are placed in a tank of dielectric cooling fluid (in this case, the fluid is supplied by 3M); the fluid boils at a temperature far lower than water. As the liquid boils heat is carried away from the servers, and this means they can operate at full power without risk of failure through overheating. Microsoft says this technology will make it easier to run demanding applications in edge locations such as at the base of a 5G tower, the company wrote in a blog post.
Liquid technology is widely used elsewhere in engineering. Most cars for example use it to prevent engine overheating. Applying the same technology to chips running in servers makes sense because more and more, chip manufacturers draw increasing amounts of power to drive performance upwards. Now that transistor widths have shrunk to the size of a couple of atoms, we are reaching a point where Moore’s Law, which says the number of transistors on a chip will double every two years, will shortly cease to apply. But while we might have reached the physical limit of chip architecture, demand grows and grows, and manufacturers have turned to power to improve performance.
Microsoft says typical CPU power usage has doubled from a hundred and fifty watts a chip to three hundred, and GPUs can use upwards of 700 watts of power. But the more power passed through a chip the hotter the chip and the higher the risk of malfunction. Air cooling has long been the solution to the problem, but that is ‘no longer enough’ especially for the artificial intelligence workloads.
(Hardware engineers on Microsoft’s team for data center advanced development inspect the inside of a two-phase immersion cooling tank at a Microsoft data center. Source: Microsoft)
Microsoft says immersion cooling will enable Moore’s Law to continue at the data center level. The reduced failure rates expected also mean it might not be necessary to replace components immediately when they fail. That would allow deployment in remote hard-to-service areas.
Microsoft is currently running a tank in a hyperscale data center, but it could also see them used at the base of a 5G cell tower, used for applications such as self-driving cars. It’s easy to imagine the submersion systems being deployed in lots of other classic edge locations such as oil and gas wells or factories where maintaining system reliability and energy efficiency are critical to operations. To that end, keep an eye on companies like Submer Technologies and GRCooling which have been commercializing immersion cooling systems that will benefit from the exposure Microsoft has brought to the topic.
3M | Azure | cloud | CPU | data center management | edge data center | facilities management | GPU | liquid cooling | Microsoft