Generative AI Training Competition Heats Up: Google, Intel, and Nvidia Go Head to Head

TL;DR:

  • MLPerf’s generative AI tests see Google, Intel, and Nvidia competing vigorously.
  • Nvidia’s colossal 10,000-GPU supercomputer and Eos set new benchmarks in GPT-3 training.
  • Intel’s Gaudi 2, with FP8 activation, achieves a 103% reduction in training time.
  • Fine-tuning capabilities of Intel’s Xeon system showcased.
  • Overall, a 2.8-fold performance boost in five months, a 49-fold increase in five years.

Main AI News:

The arena of generative AI training has witnessed a fierce battle among tech giants Google, Intel, and Nvidia. These industry leaders have been vying for supremacy in the race to harness the power of artificial intelligence and machine learning. MLPerf, the authoritative benchmark for evaluating computer systems’ ability to train neural networks, has recently evolved to include tests for training large language models (LLM), notably the likes of GPT-3. But the latest addition to the competition, Stable Diffusion, a text-to-image generator, has raised the stakes even higher.

Intel and Nvidia, renowned for their computing prowess, have dedicated substantial resources to this challenge. Notably, Nvidia’s formidable 10,000-GPU supercomputer, one of the largest ever tested, embarked on the quest to train GPT-3. This colossal undertaking demonstrates the monumental scale required in the world of generative AI. Even Nvidia’s colossal system, despite its sheer power, would have taken a grueling eight days to complete the LLM job.

The contest drew participation from 19 prominent companies and institutions, yielding more than 200 results. These results showcased a remarkable 2.8-fold performance improvement over the past five months and a staggering 49-fold increase since MLPerf’s inception five years ago.

Nvidia’s Dominance Persists: Introducing Eos

Nvidia continues to assert its dominance in the MLPerf benchmarks, powered by its H100 GPUs. However, the spotlight now falls on Eos, Nvidia’s groundbreaking 10,752-GPU AI supercomputer, which achieved a remarkable feat. Eos completed the GPT-3 training benchmark in under four minutes, setting a new standard for rapid AI training. Azure, Microsoft’s cloud computing arm, tested a system of identical size and lagged behind Eos by mere seconds.

Eos’s GPUs are nothing short of astonishing, capable of executing an aggregate 42.6 billion billion floating-point operations per second (exaflops). These GPUs are interconnected through Nvidia’s Quantum-2 Infiniband, boasting a data transfer rate of 1.1 million billion bytes per second. Dave Salvatore, Nvidia’s director of AI benchmarking and cloud computing, aptly describes these achievements as “mind-blowing.” Eos represents a significant leap in AI capabilities, tripling the number of H100 GPUs integrated into a single machine. This threefold increase has resulted in a 2.8-fold performance improvement, underscoring the critical role of efficient scaling in advancing generative AI.

The GPT-3 benchmark tackled by Eos is not a complete training cycle but rather focuses on reaching a crucial checkpoint that attests to the training’s potential accuracy with more time. Such training processes demand substantial time investments, with Eos’s exceptional performance reducing it to a mere four minutes. However, for a more modest-sized computer with 512 H100s, the training would extend over four months.

Intel’s Ascension: Leveraging Gaudi 2 and Beyond

Intel has been steadily closing in on the competition, submitting results utilizing the Gaudi 2 accelerator chip and, notably, systems relying solely on its fourth-generation Xeon CPU. A significant shift from previous benchmarks is Intel’s activation of Gaudi 2’s 8-bit floating-point (FP8) capabilities, a factor that has driven remarkable improvements in GPU performance over the past decade.

Eitan Medina, Chief Operating Officer at Intel’s Habana Labs, notes that they not only met expectations but exceeded them. The activation of FP8 resulted in a staggering 103 percent reduction in time-to-train for a 384-accelerator cluster. Gaudi 2’s performance now stands at approximately one-third the speed of an Nvidia system on a per-chip basis and is three times faster than Google’s TPUv5e in image generation benchmarks.

Medina emphasizes the cost-effectiveness of Gaudi 2 compared to Nvidia’s H100, and he anticipates further advantages with the upcoming Intel accelerator chip, Gaudi 3. Set to enter volume production in 2024, Gaudi 3 will employ the same semiconductor manufacturing process as the Nvidia H100, setting the stage for an even more intense rivalry.

In addition to the MLPerf benchmarks, Intel has demonstrated its capabilities in fine-tuning neural networks, particularly the Stable Diffusion image generator. Their 4-node Xeon system, featuring the AMX matrix engine, achieved this task in under five minutes. Fine-tuning, a crucial aspect of AI development, specializes in existing neural networks for specific tasks, exemplified by Nvidia’s chip design AI, a refinement of the NeMo large language model.

The battle for supremacy in generative AI training continues to intensify, with Google, Intel, and Nvidia pushing the boundaries of innovation and performance. As technology evolves, we can expect further remarkable developments in this arena, ultimately shaping the future of artificial intelligence.

Conclusion:

The intense competition among Google, Intel, and Nvidia in the generative AI training arena underscores the rapid evolution and growing capabilities of AI technology. With breakthroughs like Nvidia’s Eos and Intel’s Gaudi 2, the market can anticipate enhanced AI performance, cost-efficiency, and scalability, driving innovation and expanding the practical applications of artificial intelligence. This fierce race is a testament to the industry’s commitment to pushing the boundaries of what AI can achieve.

Source