MLPerf Reports Unveil Rapid Advancements in AI Performance

TL;DR:

  • MLPerf Training v3.0 benchmarks demonstrate up to 1.54x performance gains compared to six months ago and 33-49x gains over the first round, reflecting the rapid innovation in machine learning systems.
  • The addition of a large language model (LLM) and updated recommender benchmarks to the MLPerf Training suite reflects the evolving landscape of generative AI adoption.
  • MLPerf Tiny v1.1 benchmarks focus on low-power device processing, showcasing increased energy efficiency, privacy, and autonomy of edge devices.
  • MLPerf Tiny receives diverse submissions from academic, industry, and national lab participants, highlighting the range of hardware solutions and innovative software frameworks covered.

Main AI News:

The recent release of MLPerf™ benchmark suites by MLCommons®, an esteemed open engineering consortium, has brought to light remarkable advancements in the field of artificial intelligence. These benchmarks, namely Training v3.0 and Tiny v1.1, assess the performance of machine learning models in training and low-power device processing, respectively.

The acceleration of training models has paved the way for researchers to tap into new frontiers, including the latest developments in generative AI. In the latest round of MLPerf Training, the results showcased impressive industry participation and highlighted substantial performance gains, with an increase of up to 1.54x compared to just six months ago and an astonishing 33-49x improvement over the first round. These extraordinary advancements reflect the rapid pace of innovation in machine learning systems.

The MLPerf Training benchmark suite encompasses comprehensive tests that rigorously evaluate machine learning models, software frameworks, and hardware across a wide range of applications. By providing an open-source and peer-reviewed benchmark suite, MLPerf ensures fair competition that drives innovation, performance, and energy efficiency throughout the industry.

This round of MLPerf Training introduced two new benchmarks to the suite, reflecting the evolving landscape of AI adoption. The first benchmark focuses on a large language model (LLM) utilizing the GPT-3 reference model, a testament to the growing prevalence of generative AI. The second benchmark features an updated recommender, modified to align with industry practices, employing the DLRM-DCNv2 reference model. These additions contribute to the advancement of AI by ensuring that industry-standard benchmarks remain aligned with the latest trends and provide valuable guidance to customers, vendors, and researchers alike.

David Kanter, the executive director of MLCommons, expressed enthusiasm for the debut of GPT-3 and DLRM-DCNv2, emphasizing their development based on extensive community feedback and collaboration with leading customers. Kanter underlined the commitment to keeping MLPerf benchmarks representative of modern machine learning practices.

The MLPerf Training v3.0 round boasts over 250 performance results, a remarkable 62% increase compared to the previous round. These results were submitted by 16 different companies, including ASUSTek, Azure, Dell, Fujitsu, GIGABYTE, H3C, IEI, Intel & Habana Labs, Krai, Lenovo, NVIDIA, NVIDIA + CoreWeave, Quanta Cloud Technology, Supermicro, and xFusion. MLCommons extends congratulations to CoreWeave, IEI, and Quanta Cloud Technology, who participated in MLPerf Training for the first time.

Ritika Borkar, co-chair of the MLPerf Training Working Group, commended the tireless efforts of system engineers in pushing the boundaries of performance in workloads that hold significant value for users. Borkar expressed particular excitement regarding the inclusion of an LLM benchmark in this round, as it has the potential to revolutionize countless applications and inspire further system innovation.

In addition to training benchmarks, the MLPerf report also delved into the realm of embedded devices with MLPerf Tiny. These compact computing devices have become an integral part of our daily lives, powering various applications ranging from tire sensors and appliances to fitness trackers. The MLPerf Tiny benchmark suite focuses on ML inference on edge, catering to the increasing demand for enhanced energy efficiency, privacy, responsiveness, and autonomy in edge devices. By eliminating networking overhead and incorporating “tiny” neural networks, typically 100 kB and below, MLPerf Tiny offers a more efficient and secure alternative to cloud-centric approaches. The benchmark suite encompasses a wide array of inference use cases that process sensor data, such as audio and vision, to provide endpoint intelligence for low-power devices in the smallest form factors. To ensure fairness and reproducibility, MLPerf Tiny subjects these capabilities to rigorous testing and offers optional power measurement.

The latest round of MLPerf Tiny, version 1.1, received submissions from a diverse group of participants, including academic institutions, industry organizations, and national labs. Bosch, cTuning, fpgaConvNet, Kai Jiang, Krai, Nuvoton, Plumerai, Skymizer, STMicroelectronics, and Syntiant contributed to the benchmark suite, producing 159 peer-reviewed results. Additionally, this round featured 41 power measurements. MLCommons extends its congratulations to Bosch, cTuning, fpgaConvNet, Kai Jiang, Krai, Nuvoton, and Skymizer for their inaugural submissions to MLPerf Tiny.

David Kanter, Executive Director of MLCommons, expressed his excitement over the increasing number of companies embracing the Tiny ML benchmark suite. He emphasized the value and significance of having a standardized benchmark that enables device manufacturers and researchers to identify the most suitable solutions for their respective use cases. The expanded range of hardware solutions and innovative software frameworks covered in the v1.1 release highlights the breadth of participation from new companies. Dr. Csaba Kiraly, co-chair of the MLPerf Tiny Working Group, noted that the combined impact of software and hardware performance improvements has reached up to 1000-fold in certain areas compared to the initial reference benchmark results, showcasing the rapid pace of innovation in this field.

Conclusion:

The latest MLPerf reports reveal significant advancements in AI performance across training and low-power device processing. The substantial performance gains in MLPerf Training benchmarks indicate the rapid pace of innovation, empowering researchers to unlock new capabilities. The addition of benchmarks for large language models and updated recommenders reflects the industry’s adoption of generative AI. Furthermore, MLPerf Tiny benchmarks highlight the growing demand for energy-efficient, privacy-focused, and autonomous edge devices. The diverse range of participants and submissions in both Training and Tiny benchmarks signifies the market’s commitment to driving innovation and offering a variety of hardware solutions and software frameworks. This progress opens up new opportunities for businesses to leverage AI technologies and create more capable and intelligent systems.

Source