The silicon arms race at the edge: A look at the specialized processors powering AI inference

The AI processor market is undergoing rapid transformation. Leading chip designers and hyperscalers are racing to produce the fastest, most efficient processors to power AI training and inference. The largest tech companies are investing tens of billions of dollars to develop semiconductors capable of meeting the demands of generative AI. This article explores the current state of AI chip design, the need for power and cooling and the technologies to be deployed at scale over the next few years.
Generative AI drives specialized hardware
Generative AI has prompted a surge of new companies and applications. Bloomberg projects the sector could reach $1.3 trillion by 2032. Amazon is committing $150 billion to data centers to support its growth, Google aims to invest $25 billion and Microsoft and OpenAI plan a $100 billion AI supercomputer. These investments hinge on access to specialized processors.
Google’s Ironwood TPU delivers 42.5 exaflops at scale, with 4,614 teraflops per chip, 192 gigabytes of high-bandwidth memory and 7.37 terabits per second of bandwidth. It doubles performance per watt relative to previous TPUs and is 24 times more powerful than the world’s fastest supercomputer, El Capitan, which delivers 1.7 exaflops.
NVIDIA’s Rubin CPX graphics processing units (GPU) can achieve 30 petaflops on a single die and, when scaled across NVL144 racks, deliver eight exaflops, enabling long-context generative AI tasks. These architectures optimize performance while lowering operational costs, providing a clear ROI for enterprises deploying large-scale AI workloads.
NVIDIA’s market position and industry response
NVIDIA has become the default supplier for AI infrastructure. The Hopper architecture, paired with the mature CUDA ecosystem, enabled scalable generative AI and positioned the Santa Clara-based vendor to capture over 80% of the AI chipset market. Hyperscalers procured H100 GPUs at lead times extending to 52 weeks in 2023 as the 2020 chip shortage mellowed, demonstrating demand and supply constraints.
Competitors are pursuing alternatives. Google trains Gemini AI on custom TPUs, reducing NVIDIA reliance. Microsoft uses NVIDIA via OpenAI while building Azure Maia AI chips and Cobalt CPUs. Amazon combines NVIDIA partnerships with in-house chips and Anthropic. Meta now deploys custom AI chips. AMD’s MI300 and Intel’s Gaudi3 GPUs offer cost-effective options when flexibility outweighs proprietary ecosystems.
The leading vendor counters with the Blackwell GPU, offering up to 25 times lower cost and energy consumption per trillion-parameter large language model inference than previous generations. Blackwell’s software ecosystem, reference architectures and partnerships ensure broad adoption. NVIDIA also launched a $30 billion initiative to produce custom chips for other organizations, illustrating a mix of competition and collaboration in the industry.
Thermal design and liquid cooling
Specialized AI processors generate heat far beyond traditional servers. The AMD Instinct MI300X GPU is a guzzler with a maximum power consumption of 750 watts per unit.
That means a typical server equipped with four MI300X GPUs consumes approximately 3,000 watts, excluding CPUs and memory. Scaling this to a 20-server rack results in roughly 60,000 watts of throughput, not accounting for other components.
On top of that, NVIDIA’s B200 GPUs can output 1,200 watts per chip. These high-power demands exceed the capacity of conventional thermal management, prompting data centers to adopt liquid cooling solutions.
Liquid cooling is essential for high-performance workloads. It transfers heat up to 30 times more efficiently than air, reduces energy consumption, enables processors to maintain peak capability and extends chip longevity. Liquid-cooled racks, such as NVIDIA’s GB200 NVL72, require 120 kilowatts of GPU capacity, compared to 30 kilowatts for air-cooled racks.
Cooling technologies
Two liquid cooling methods dominate – immersion and direct-to-chip. Immersion submerges components in a dielectric liquid, either single or two-phase, but requires significant infrastructure overhaul, server modifications and staff retraining.
Direct-to-chip delivers coolant to hot spots via cold plates. Single-phase cold plates are simpler, scalable and cost-efficient, while two-phase designs offer higher heat capacity but with complexity and toxicity.
Among the two, single-phase direct-to-chip is leading adoption among hyperscalers. JetCool’s patented microconvective technology targets hot spots, eliminates thermal interface materials, supports inlet coolant temperatures up to 60 degrees Celsius and efficiently cools high-TDP processors. It enables PUE to be as low as 1.02 in NVIDIA GPU environments and supports processors of over 1,500 watts per socket. Advanced solutions like SmartSilicon integrate thermal design into chip substrates, allowing even 4-kilowatt (4,000-watt) processors to operate efficiently.
Specialized processors as the cornerstone of edge intelligence
Generative AI transforms edge data centers and drives demand for custom-specific chips and advanced cooling solutions. As hyperscalers and chipmakers compete in this silicon arms race, technologies once confined to science fiction are becoming reality. The lack of a finish line underscores AI’s momentum and influence today.
About the author
Ellie Gabel is a freelance writer as well as an associate editor for Revolutionized.com. She’s passionate about covering the latest innovations in science and tech and how they’re impacting the world we live in.
Article Topics
AI processors | AI/ML | chips | edge computing | EDGE Data Centers | generative AI | hyperscale | liquid cooling | Nvidia blackwell
Comments