Moore’s law slowdown necessitates larger and more power-hungry HPC and AI facilities

TL;DR:

  • The pursuit of more powerful HPC and AI clusters faces challenges as Moore’s law slows down.
  • Larger facilities with higher power consumption and cooling demands are needed for increased performance.
  • Sustainability experts emphasize the importance of Power Usage Efficiency (PUE) while considering environmental impacts.
  • Evaporative cooling systems, while efficient, consume significant water resources, raising ethical concerns.
  • Strategic location choices, green energy sources, and waste heat utilization can mitigate environmental impact.
  • A dynamic utilization approach can lead to substantial reductions in power costs and carbon emissions.
  • Improved reporting and consistency in sustainability metrics are essential for accountability in the industry.

Main AI News:

In today’s fast-paced world of technology, the pursuit of ever-more powerful High-Performance Computing (HPC) and Artificial Intelligence (AI) clusters has become a challenging endeavor. The traditional Moore’s law, which once promised exponential growth in computing power, is slowing down. This slowdown necessitates the construction of larger and more energy-intensive facilities to meet the increasing demand for performance.

University of Utah professor Daniel Reed, speaking at the recent SC23 supercomputing conference in Denver, pointed out the conundrum: “If you want more performance, you need to buy more hardware, and that means a bigger system; that means more energy dissipation and more cooling demand.”

The largest supercomputing clusters today consume over 20 megawatts of power. Some projections even indicate that by 2027, a capability-class supercomputer could require a staggering 120 megawatts of power.

Addressing the sustainability challenges in high-performance computing was the focus of a panel discussion featuring experts from the University of Chicago, Schneider Electric, Los Alamos National Laboratory, Hewlett Packard Enterprise, and the Finnish IT Center for Science. Their insights shed light on how we should plan, deploy, report, and operate these facilities in the future.

BALANCING POWER EFFICIENCY WITH ENVIRONMENTAL RESPONSIBILITY

One of the central themes of the discussion revolved around Power Usage Efficiency (PUE), a crucial metric that measures a datacenter’s efficiency by comparing power consumption against total utilization. The closer the PUE is to 1.0, the more efficient the facility.

However, Nicolas Dubé of HPE highlighted a concerning trend among some large datacenter operators. He criticized hyperscalers for building datacenters in arid regions like Arizona and New Mexico, where evaporative cooling systems lead to impressive PUE numbers but consume vast amounts of precious water resources.

Dubé remarked, “You build datacenters there, and if you use evaporative cooling, you’re going to have spectacular PUE. However, you’re going to consume a resource that’s way more important to that community than optimizing for a few percent of the energy consumption. I think that’s criminal.”

Evaporative cooling systems are highly efficient but require substantial water usage. Genna Waldvogel of Los Alamos highlighted their approach of using reclaimed water to minimize the impact of evaporative cooling.

The significant water consumption associated with evaporative cooling is forcing datacenter operators to be more considerate of their system placement choices.

STRATEGIC LOCATION AND PLANNING

Dubé emphasized the importance of selecting locations with abundant green energy sources to mitigate the environmental impact of generative AI and datacenters. He cited a 100-megawatt datacenter in Quebec, where nearly 100 percent of power comes from renewable sources like hydro and wind. Dubé suggested that large-scale workloads should be relocated to areas where sustainability aligns with computing needs.

Moreover, datacenter facilities could harness the waste heat generated for productive use. The QScale facility in Quebec, for instance, plans to utilize waste heat to warm nearby agricultural greenhouses.

Dubé humorously illustrated the potential by asking, “Just how many tomatoes can you grow just by training GPT-3 once?” His calculations revealed an impressive yield, showing the opportunity for sustainable heat reuse.

A NEW APPROACH: DYNAMIC UTILIZATION

Andrew Chien of the University of Chicago’s CERES Center for Unstoppable Computing proposed a dynamic approach to improve datacenter sustainability. Instead of running HPC clusters at constant capacity, operators could adjust utilization based on the availability of sustainable energy sources in the grid.

Chien’s analysis suggested that implementing this approach could lead to a 90 percent reduction in power costs and a 40 percent reduction in carbon emissions for projects like “Fugaku Next” at RIKEN Lab in Japan, which is expected to come online in the near future.

BETTER REPORTING FOR GREATER ACCOUNTABILITY

Sustainability in the realm of HPC and AI necessitates better and more consistent reporting. Schneider Electric’s Robert Bunger called for the HPC community to lead in sustainability reporting. To address the current lack of consistency in sustainability metrics reporting, Schneider proposed 28 key metrics, including power consumption, PUE, renewable energy consumption, water use efficiency, and more.

While tracking all 28 metrics may be daunting for some, starting with a subset could help datacenter operators move towards greater accountability and sustainability in their operations.

Conclusion:

The pursuit of higher performance in HPC and AI must be balanced with sustainability and environmental responsibility. Strategic location choices, efficient cooling methods, dynamic utilization, and improved reporting can collectively pave the way for a more sustainable future in high-performance computing.

Source