The National Center for Supercomputing Applications has received a $10 million grant to enhance its Delta system with an AI partition called DeltaAI

TL;DR:

  • The National Center for Supercomputing Applications (NCSA) has received a $10 million grant to enhance its Delta system with an AI partition called DeltaAI.
  • DeltaAI will be based on Nvidia’s powerful “Hopper” H100 GPU accelerators, significantly boosting the computational capabilities of the Delta supercomputer.
  • The original Delta system utilized a combination of Apollo 2500 CPU nodes and Apollo 6500 CPU-GPU nodes, interconnected with the “Rosetta” Slingshot Ethernet interconnect.
  • The DeltaAI upgrade will populate Apollo 6500 nodes with newer H100 SXM5 GPU accelerators and leverage the Slingshot network for enhanced performance.
  • The Delta+DeltaAI machine will deliver a remarkable 16.6 petaflops peak FP64 performance and an astonishing 732.8 petaflops peak FP16 performance for AI workloads.

Main AI News:

In a remarkable development, the National Center for Supercomputing Applications (NCSA) at the University of Illinois has recently received a substantial $10 million grant from the National Science Foundation. This generous funding is aimed at expanding the Delta system, a cutting-edge supercomputer, with an artificial intelligence (AI) partition known as DeltaAI. This new partition, leveraging Nvidia’s powerful “Hopper” H100 GPU accelerators, will propel the Delta system to new heights of computational prowess.

The global landscape of academic high-performance computing (HPC) centers is vast and diverse, encompassing numerous institutions equipped with highly capable systems. Collectively, these centers likely account for a significant portion—roughly two-thirds, to hazard an estimate—of the world’s HPC capacity. Yet, due to the limitations of the widely recognized Top500 supercomputing list, which primarily features machines focused on tasks other than HPC, the exact details of these academic research centers’ computing capabilities remain largely unknown.

In light of these circumstances, it becomes crucial to contemplate the value that $10 million can bring in terms of augmenting computational capacity in today’s era. The original Delta machine, which we previously examined, boasted a price tag identical to the grant amount. It comprised a blend of Apollo 2500 CPU nodes and Apollo 6500 CPU-GPU nodes, both provided by Hewlett Packard Enterprise (HPE). These nodes were interconnected using the remarkable “Rosetta” Slingshot Ethernet interconnect, developed by Cray and presently under HPE’s control. The system consisted of 124 Apollo 2500 nodes, each featuring a pair of 64-core AMD “Milan” Epyc 7763 CPUs, as well as 200 Apollo 6500 nodes. Among the latter, 100 nodes housed the same Milan CPUs along with four 40 GB Nvidia “Ampere” A100 accelerators, while the remaining 100 nodes incorporated four Nvidia Ampere A40 accelerators—ideal for rendering, graphics, and AI inference. Remarkably, each of these machines possessed 256 GB of memory, a modest quantity for AI work but still providing a reasonable 2 GB per core ratio, albeit somewhat light. (For optimal performance, 3 GB per core is preferable, and 4 GB per core is even more desirable.) Additionally, the Delta system encompassed a testbed partition based on Apollo 6500 enclosures, featuring a pair of Milan CPUs and eight 40 GB Nvidia A100 SXM4 GPUs interconnected via NVSwitch fabrics, enabling access to a capacious 2 TB of main memory. This configuration was explicitly tailored to address AI workloads.

Upon calculating the cumulative computational capabilities of the CPUs and GPUs within the Delta system, it becomes evident that the vector engines, designed specifically for HPC workloads, collectively deliver 6 petaflops of performance. In contrast, when accounting for FP16 operations across the CPU vector engines and GPU matrix math engines—without enabling sparsity for AI workloads—the system achieves a staggering 131.1 petaflops. Notably, the NCSA has not featured in the Top500 rankings of supercomputers since 2012 when Cray constructed the hybrid CPU-GPU “Blue Waters” system. Blue Waters, an impressive machine with a peak FP64 double-precision performance of 13.1 petaflops, carried an exorbitant price tag of $188 million, incorporating 49,000 Opteron processors and 3,000 Nvidia GPUs. Although Delta’s FP64 performance falls short of Blue Waters’ capabilities, it compensates with ten times the computational power for FP16 calculations. Moreover, the Delta system consumes substantially less power and occupies a considerably smaller physical footprint.

With limited information available regarding the $10 million DeltaAI upgrade, it appears that NCSA plans to augment numerous Apollo 6500 nodes with the latest “Hopper” H100 SXM5 GPU accelerators, effectively integrating them into the Slingshot network. While the details surrounding this award remain sparse, the announcement by the NSF and NCSA confirms that the compute elements of DeltaAI will incorporate over 300 state-of-the-art Nvidia graphics processors, delivering an impressive 600 petaflops of half-precision floating-point computing. This vast computational power will be distributed across an advanced network interconnect, facilitating seamless application communications, and will enjoy access to an innovative flash-based storage subsystem.

Upon analyzing this information, a configuration featuring 38 servers, each equipped with eight H100 SXM5 GPU accelerators, aligns precisely with the specified parameters. This setup comprises 304 GPUs, ultimately yielding a peak FP16 performance of 601.6 petaflops, leveraging the H100 Tensor Cores with sparsity enabled. Therefore, it is plausible to assume that this represents the intended DeltaAI configuration. Budgetary calculations also support this hypothesis, considering an approximate 30 percent discount on GPUs and a mere 10 percent expenditure on networking. Furthermore, the addition of the DeltaAI partition will contribute an extra 10.6 petaflops of FP64 vector performance, combining the computational capabilities of the CPUs and GPUs across these 38 nodes. As a result, the overall performance of the Delta+DeltaAI supercomputer will culminate in a remarkable 16.6 petaflops at peak FP64 precision and an astounding 732.8 petaflops at peak FP16 precision—exemplifying its unrivaled proficiency in AI workloads.

Conclusion:

The expansion of the Delta supercomputer with AI integration represents a significant milestone in the computing industry. The incorporation of Nvidia’s cutting-edge H100 GPU accelerators in the DeltaAI partition empowers the system with unprecedented computational power, enabling researchers and scientists to tackle complex AI workloads at an exceptional scale. This development showcases the continuous evolution of high-performance computing, opening new opportunities for breakthrough discoveries and advancements across various fields. It also underscores the growing significance of AI in scientific research, further driving the demand for advanced computing solutions in the market. The NCSA’s commitment to pushing the boundaries of computing capabilities sets a benchmark for other institutions and organizations, propelling the industry forward into a new era of accelerated scientific innovation.

Source