TL;DR:
- The Aurora supercomputer at Argonne National Laboratory, powered by Intel’s CPUs and GPUs, has overcome challenges to reach its final incarnation.
- The original Aurora machine faced delays and financial issues, but the new design offers improved capabilities.
- The final Aurora architecture resembles a modern AI training system and surpasses the original design inspired by IBM’s BlueGene.
- The emergence of AI forced Argonne and Intel to reconsider the Aurora design, leading to a hybrid CPU-GPU architecture.
- The delayed “Ponte Vecchio” GPU impacted the planned specifications, resulting in an estimated peak performance of just over 2 exaflops.
- The Aurora machine is expected to achieve top rankings on the Top500 list, with sustained performance between 1.31 exaflops and 1.41 exaflops on the High Performance Linpack benchmark.
- The net cost of $200 million for a 2 exaflops machine is considered a remarkable deal, compensating for previous challenges.
- The technical specifications of the Aurora 2023 machine include significant improvements in memory capacity and bandwidth.
- The Cray network in Aurora demonstrates slightly lower performance than originally anticipated but still exceeds expectations across a larger machine.
- The AuroraGPT project aims to develop a generative large language model for scientific data analysis.
- Benchmark results comparing Aurora’s testbed machines to those with AMD and Nvidia GPUs provide insights into performance capabilities.
Main AI News:
As remarkable scientific breakthroughs unfold on the latest iteration of the “Aurora” supercomputer at Argonne National Laboratory, powered by Intel’s cutting-edge CPUs and GPUs, the initial tribulations encountered during its development may soon fade into distant memory. The scientific community, with its forward-thinking perspective, eagerly anticipates leveraging the advanced simulation and modeling capabilities of this state-of-the-art system.
The journey leading up to the current Aurora machine has been fraught with challenges. Originally announced in April 2015 and scheduled for delivery by the end of 2018, the first iteration of Aurora, based on CPUs and GPUs, fell short of expectations. However, scientists are resilient and driven by their insatiable curiosity, quickly adapting their work to align with the evolving technological landscape.
The silver lining lies in the fact that Argonne secured a $500 million deal to acquire a machine capable of pushing the boundaries of exascale computing—a monumental achievement. Remarkably, Intel Federal, the primary contractor for the Aurora project, incurred a $300 million writeoff in 2021, shedding light on the convoluted financial intricacies surrounding the endeavor. Argonne remains hopeful that the final cost will align with the initial budget of $200 million, thus securing a remarkable technological feat at a reasonable price.
The current incarnation of Aurora boasts an architecture that better suits the demands of the time. Strikingly resembling a state-of-the-art AI training system rather than its predecessor, which drew inspiration from IBM’s BlueGene/Q machine, this new design reflects Intel’s proactive response to the transformative impact of AI on the high-performance computing (HPC) landscape. Al Gara, the architect behind IBM’s BlueGene processors and the Knights CPU and coprocessor designs, played a pivotal role in shaping Aurora’s blueprint, resulting in a purpose-built system for modern scientific pursuits.
Intel embarked on its HPC endeavors in the mid-2010s with a visionary aspiration to create an X86 and InfiniBand-based supercomputing infrastructure that encapsulated the best aspects of the BlueGene philosophy. This approach was well-suited for traditional HPC workloads involving simulations and modeling. However, the emergence of AI in 2012 dramatically transformed the HPC landscape, compelling Argonne and Intel to reconsider their architectural choices for Aurora. The GPU-accelerated machines deployed at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory—namely, the “Summit” and “Sierra” systems, respectively—utilized a hybrid CPU-GPU architecture capable of accommodating AI training workloads alongside accelerated HPC tasks.
A significant turning point arrived when the “Ponte Vecchio” Max Series GPU encountered a delay in October 2021. Originally, the plan for Aurora involved leveraging over 9,000 nodes, each equipped with a pair of “Sapphire Rapids” Xeon SPs and six Ponte Vecchio CPUs, yielding a formidable 45 teraflops at double precision floating point performance. However, subsequent revelations highlighted an increase in the peak theoretical performance of a single Ponte Vecchio GPU to 52 teraflops in 2022, rendering the original projections obsolete.
By extrapolating this data based on the disclosed specifications for the Aurora machine—63,744 GPUs across 10,624 nodes—a tantalizing possibility emerged: an astonishing aggregate peak theoretical compute of 3.31 exaflops for the Aurora 2023 system. Despite Intel’s vague reference to “in excess of 2 exaflops” of peak theoretical DP performance, further clarification from Intel has confirmed that Aurora will deliver just over 2 exaflops of peak double precision floating point capability. Notably, Intel has diligently added node counts to surpass the 2 exaflops threshold, demonstrating their commitment to delivering a supercomputer of unparalleled performance.
The High Performance Linpack benchmark test, which largely depends on the scaling behavior of the Hewlett Packard Enterprise-developed Slingshot Ethernet interconnect across the vast Aurora infrastructure, will ultimately determine the sustained performance achieved by the system. Based on conservative estimates, it is anticipated that Aurora will secure a position at the summit of the Top500 list in November of this year, delivering sustained Linpack performance ranging from 1.31 to 1.41 exaflops. Acquiring a machine capable of such immense power for a mere $200 million—assuming the reported financial details hold true—represents an unprecedented feat. Indeed, this is an exceptional deal that may never be replicated, even if Intel paid $500 million, a figure comparable to Lawrence Livermore’s investment in El Capitan, to achieve a peak performance of 2 exaflops.
From our vantage point, if indeed the reported $300 million writeoff led to just compensation for Argonne’s challenges, it is only fitting that Rick Stevens, the associate laboratory director for computing, environment, and life sciences, will finally revel in a well-deserved smile. The completion of the Aurora project fulfills Argonne’s mandate of pioneering groundbreaking computational technologies and ensuring alternative suppliers are available. With the imminent deployment of this remarkable system, scientific exploration can commence in earnest, promising profound advancements on the horizon.
Examining the Technical Specifications
The original Aurora 2018 machine, with Cray as a subcontractor for its “Shasta” Cray XE system designs, entailed a combination of HBM, DDR, and Optane memory technologies. Weighing in at over 7 PB, this memory configuration offered an astonishing aggregate memory bandwidth of more than 30 PB/sec. The Omni-Path interconnect, a key component, incorporated silicon photonics to deliver an impressive aggregate node link bandwidth of over 2.5 PB/sec (note that this value represents bytes, not bits). Additionally, the system featured a burst buffer leveraging Intel flash drives and a Lustre file system, boasting a capacity exceeding 150 PB and a remarkable throughput of over 1 TB/sec.
The specifications of the Aurora 2023 machine were recently unveiled by Jeff McVeigh, general manager of the Super Compute Group, during a prebriefing ahead of the ISC 2023 supercomputing conference in Hamburg, Germany. The Aurora 2023 configuration encompasses two six-node setups on the lower left and an HPE 200 Gb/sec Slingshot switch, powered by the “Rosetta” ASIC, situated beside it. While Intel has successfully delivered all the blades, it remains unclear whether all the CPUs and GPUs have been fully integrated into them at this stage.
During the conference call, McVeigh proudly announced, “We are very pleased to announce we have delivered over 10,000 blades. We have much more work to do for full optimization, delivering on the codes, and acceptance. But this is a critical milestone that we’re very, very happy to have accomplished.”
The Aurora 2023 machine offers an unprecedented level of power, clocking in at an impressive 2,007 petaflops (rounding up to 2.01 exaflops). This figure represents an 11.2-fold increase in performance compared to what the original Aurora 2018 machine was projected to achieve. To support the intensified computational capabilities, the combined memory capacity across the CPUs and GPUs has surged to 20.4 PB, a staggering 2.9-fold increment. Moreover, the aggregate memory bandwidth has skyrocketed to 245.4 PB/sec, a remarkable 8.2-fold improvement.
In terms of network performance, the Cray network, operating at 2.18 PB/sec injection bandwidth, falls approximately 15 percent short of the anticipated performance with the Omni-Path 200 network. Despite this slight deviation, it is crucial to acknowledge that the Cray network spans a considerably larger machine. Additionally, the bi-section bandwidth demonstrates an impressive 38 percent increase, reaching 0.69 PB/sec. The storage capabilities of Aurora are equally impressive, with 1,024 nodes dedicated to the DAOS file system. This setup boasts a staggering capacity of 230 PB and a bandwidth of 31 TB/sec, signifying a 53 percent expansion in capacity and a remarkable 30-fold increase in file system throughput for the Aurora 2023 machine.
McVeigh also mentioned that Intel and Argonne are collaborating with the technical community to develop AuroraGPT, a generative large language model enriched with a comprehensive corpus of scientific data. Furthermore, McVeigh shared a range of benchmarks showcasing the capabilities of Ponte Vecchio GPUs. However, considering the scaled-back nature of the GPUs in Aurora, primarily due to thermal constraints, the relevance of these benchmarks to Aurora’s performance may be limited. Nonetheless, the OpenMC Monte Carlo simulation results, comparing Aurora’s testbed machines to those equipped with AMD Instinct MI250X and Nvidia A100 GPUs, offer promising insights.
To comprehensively assess Aurora’s potential, more details regarding the OpenMC test are required, enabling a thorough comparison between Aurora’s nodes and the forthcoming Nvidia “Hopper” H100 GPUs, as well as future AMD Instinct MI300A CPU-GPU hybrids.
Conlcusion:
The development and imminent deployment of the Aurora supercomputer at Argonne National Laboratory, with its enhanced architecture and impressive computational capabilities, have significant implications for the market. This advancement represents a major step forward in high-performance computing, particularly in the intersection of AI and scientific research. With its potential to deliver sustained performance in the exaflops range, Aurora establishes itself as a pioneering solution that can propel innovation in various industries reliant on advanced computing capabilities.
The remarkable cost-efficiency of the project, coupled with Intel’s commitment to compensating for previous challenges, showcases the potential for future advancements in supercomputing. As the market witnesses the transformative power of Aurora and its impact on scientific endeavors, it is expected to drive increased demand for similar cutting-edge technologies, pushing the boundaries of computational possibilities and opening doors to unprecedented discoveries.