New Era of AI Performance: Intel’s Success in MLPerf Training 3.0

TL;DR:

  • Intel’s Habana Gaudi2 and 4th Gen Intel Xeon Scalable processor deliver remarkable performance in MLPerf Training 3.0 benchmark.
  • Intel’s AI solutions provide competitive alternatives to Nvidia GPUs for generative AI and large language models.
  • Gaudi2 demonstrates outstanding performance and scalability on the demanding GPT-3 model.
  • Intel Xeon processors offer efficient and cost-effective AI capabilities for enterprises.
  • MLPerf results highlight the excellent scalability and performance of Intel’s solutions in various deep learning models.
  • Intel’s Advanced Matrix Extensions (Intel AMX) enhance performance improvements.
  • Intel’s continued commitment to open-source software and network adapters reinforces its position in the AI market.

Main AI News:

Intel has once again demonstrated its prowess in the field of artificial intelligence (AI) with outstanding results in the MLPerf Training 3.0 benchmark. The latest report published by MLCommons showcases the impressive performance of both the Habana Gaudi2 deep learning accelerator and the 4th Gen Intel Xeon Scalable processor, solidifying Intel’s position as a leader in AI technology.

Sandra Rivera, Intel’s executive vice president and general manager of the data center and AI group, emphasizes the value that Intel Xeon processors and Gaudi deep learning accelerators bring to the AI landscape. She states, “The latest MLPerf results published by MLCommons validate the TCO value Intel Xeon processors and Intel Gaudi deep learning accelerators provide to customers in the area of AI. Xeon’s built-in accelerators make it an ideal solution to run volume AI workloads on general-purpose processors, while Gaudi delivers competitive performance for large language models and generative AI.”

Intel’s scalable systems, coupled with optimized and easy-to-program open software, offer customers and partners a seamless experience in deploying a wide range of AI-based solutions. From the cloud to the intelligent edge, Intel’s portfolio of AI solutions breaks free from closed ecosystems, providing competitive options that enhance efficiency and scalability.

Contrary to the prevailing industry narrative, which suggests that generative AI and large language models (LLMs) are limited to Nvidia GPUs, Intel’s data challenges this notion. The MLPerf Training 3.0 results underscore the exceptional performance of Intel’s products across various deep learning models. In particular, the maturity of Gaudi2-based software and systems for training was demonstrated at scale on the impressive GPT-3 language model. Gaudi2 stands out as one of the two semiconductor solutions that submitted performance results to the benchmark for LLM training of GPT-3.

Moreover, Gaudi2 offers substantial cost advantages to customers, not only in terms of server and system costs but also in overall performance. Its MLPerf-validated results on GPT-3, computer vision, natural language models, and upcoming software advancements position Gaudi2 as an enticing price/performance alternative to Nvidia’s H100.

On the CPU front, the 4th Gen Xeon processors with Intel AI engines exhibit exceptional deep learning training performance. Customers can build a single universal AI system for data pre-processing, model training, and deployment, leveraging Xeon-based servers. This approach ensures optimal AI performance, efficiency, accuracy, and scalability.

Rivera further highlights the significance of Habana Gaudi2’s performance and scalability in training generative AI and large language models. She states, “Training generative AI and large language models requires clusters of servers to meet massive compute requirements at scale. These MLPerf results provide tangible validation of Habana Gaudi2’s outstanding performance and efficient scalability on the most demanding model tested, the 175 billion parameters GPT-3.

The MLPerf Training 3.0 results offer several notable achievements, including Gaudi2’s impressive time-to-train on GPT-3, which recorded 311 minutes on 384 accelerators. Additionally, the scalability from 256 to 384 accelerators on the GPT-3 model showcased near-linear 95% scaling. Intel’s solutions also achieved excellent training results on computer vision models such as ResNet-50 and Unet3D, as well as natural language processing models like BERT.

The performance increases of 10% and 4% for BERT and ResNet models, respectively, compared to the previous submission, signify the continuous growth and maturity of Gaudi2 software. Importantly, Gaudi2’s results were achieved “out of the box,” meaning customers can expect comparable performance whether implementing Gaudi2 on-premise or in the cloud.

The software support for the Gaudi platform continues to evolve, aligning with the rising demand for generative AI and LLMs. Gaudi2’s GPT-3 submission, based on PyTorch and employing the popular DeepSpeed optimization library, further optimizes scaling performance efficiency on LLMs through 3D parallelism (Data, Tensor, Pipeline).

Looking ahead, Gaudi2’s performance is expected to see a significant leap with the release of software support for FP8 and new features in the third quarter of 2023.

Rivera emphasizes the unique advantages offered by Intel Xeon processors, stating, “As the lone CPU submission among numerous alternative solutions, MLPerf results prove that Intel Xeon processors provide enterprises with out-of-the-box capabilities to deploy AI on general-purpose systems and avoid the cost and complexity of introducing dedicated AI systems.”

The MLPerf results highlight the remarkable performance achieved by 4th Gen Xeons in both closed and open divisions. For instance, BERT and ResNet-50 models were trained in less than 50 minutes and less than 90 minutes, respectively, in the closed division. In the open division, Xeon achieved a time of approximately 30 minutes when scaling to 16 nodes for BERT. Even larger models like RetinaNet could be trained using off-peak Xeon cycles, offering flexibility and efficiency to customers.

Intel’s Advanced Matrix Extensions (Intel AMX) further enhance out-of-the-box performance improvements across multiple frameworks, end-to-end data science tools, and a broad ecosystem of smart solutions.

Rivera concludes by acknowledging the significance of MLPerf as a reputable benchmark for AI performance. She emphasizes that Intel has surpassed the 100-submission milestone, remaining the only vendor to submit public CPU results using industry-standard deep-learning ecosystem software. Intel’s commitment to providing fair and repeatable performance comparisons is reinforced by its continued partnership with the open source Intel Ethernet Fabric Suite Software, which utilizes cost-effective and readily available Intel Ethernet 800 Series network adapters.

Conclusion:

Intel’s dominant performance in the MLPerf Training 3.0 benchmark is a significant achievement for the company, positioning it as a leader in the AI market. The success of Habana Gaudi2 and 4th Gen Intel Xeon Scalable processors demonstrates their capabilities in delivering high-performance AI solutions. By providing competitive alternatives to Nvidia GPUs, Intel offers customers the opportunity to break free from closed ecosystems and benefit from increased efficiency and scalability. The impressive scalability and performance across various deep learning models further strengthen Intel’s position. With Advanced Matrix Extensions and a commitment to open-source software, Intel continues to innovate and deliver solutions that meet the growing demand for AI technologies.

Source