Ambarella Showcases LLM Inference Capabilities on Cutting-Edge Autonomous Driving Chip

TL;DR:

  • Ambarella demonstrates LLM inference on its CV3-AD chip designed for autonomous driving.
  • CV3-AD achieves 25 tokens per second using Llama2-13B model with Ambarella’s AI software stack.
  • Ambarella’s CVFlow engine, including the NVP, empowers LLM execution.
  • The company plans to extend support for other LLM models in the future.
  • Ambarella aims to deliver A100-level LLM performance with just a quarter of the power consumption.
  • Existing customers in the security and automotive sectors are considering LLM integration into edge systems.
  • Ambarella foresees the potential of multi-modal generative AI at the edge.
  • Developing a dedicated LLM-specific edge chip is under consideration.
  • Ambarella explores the possibility of using the CV3 family in data center applications.
  • The company is strategizing its position in the hyperscale and cloud-based solutions market.

Main AI News:

In a recent breakthrough, Ambarella, the computer vision chip company, has demonstrated its ability to execute large language models (LLMs) on its advanced CV3-AD chip, a formidable member of the CV3 chip family designed exclusively for autonomous driving applications. The CV3-AD, which commenced sampling approximately a year ago, boasts an impressive capability to perform inference at a remarkable rate of 25 tokens per second while executing the Llama2-13B model, all thanks to Ambarella’s pre-existing AI software stack.

Ambarella’s CEO, Fermi Wang, shared insights during an exclusive interview with EE Times, stating, “When transformers gained popularity earlier this year, we began pondering whether we could leverage our potent inference engine for this purpose. After conducting swift assessments, we concluded that indeed we can, and we estimate that we can approach the performance level of Nvidia’s A100.

At the heart of Ambarella’s on-chip prowess lies the CVFlow engine, encompassing the neural vector processor (NVP) and a general vector processor (GVP). The LLM demonstration operates within the NVP, a component characterized by its data-flow architecture. Ambarella has ingeniously amalgamated instructions for higher-level operations, such as convolution, into graphs that delineate the flow of data through the processor for each operation. This seamless communication between operators is facilitated through on-chip memory. Notably, the CV3 family harnesses LPDDR5 technology (as opposed to HBM) and consumes approximately 50 watts of power.

The successful execution of the LLM demonstration necessitated the development of new software components, including the implementation of building blocks tailored for the core operations of the transformer architecture. Les Kohn, Ambarella’s CTO, explained, “Over time, we plan to extend this support to encompass other models, but currently, Llama2 is emerging as a de facto standard in the open-source domain. While it entails a significant investment, it is far from starting from scratch.”

Ambarella is now strategically positioned to address critical challenges faced by LLM technology. Wang elaborated, “Now that we know we possess the technology, we can address real-world problems. If you speak to experts in the LLM domain and inquire about their primary concerns, two issues invariably stand out: price and power consumption.”

With the CV3-AD meticulously designed to operate within a 50-watt power envelope, Wang aspires to deliver LLM performance akin to Nvidia’s A100, all while consuming a fraction of the power. He emphasized, “This means that with the same amount of data center power, we can deliver four times the AI performance—a substantial value proposition. While it may sound straightforward, we believe this presents a valuable opportunity for those seeking to harness LLMs. The demand for LLMs has surged in the past six months.”

Although hyperscale companies were early adopters of LLM technology, Ambarella’s existing customer base in security cameras and automotive sectors is now contemplating the integration of LLMs into their edge systems and roadmap strategies. Wang highlighted, “We believe that LLMs will emerge as a vital technology on our roadmap for current clients. The current CV3 platform can effortlessly accommodate LLMs, necessitating minimal additional engineering effort from Ambarella. Thus, this is not a distraction; rather, it aligns with our customers’ evolving needs.”

Furthermore, Ambarella envisions the potential of multi-modal generative AI—large models capable of generating both textual and visual content—at the edge. Kohn elaborated, “For applications like robotics, transformer networks offer unprecedented capabilities in computer vision processing, surpassing traditional models due to their ability to handle zero-shot learning—a capability absent in smaller models.”

Zero-shot learning refers to the capability of models to deduce information about object classes absent from their training data, enabling them to predict and address edge cases more effectively—an invaluable feature in autonomous systems. Kohn emphasized, “Autonomous driving essentially falls under the purview of robotics. To achieve L4/L5 autonomous systems, we require vastly superior, more versatile AI models capable of comprehending the world in a manner akin to human cognition. We perceive this as a means to unlock more potent AI processing across diverse edge applications.”

Addressing the possibility of developing a dedicated LLM-specific edge chip, Wang stated, “It is a consideration that warrants our attention. LLMs demand a significant amount of DRAM bandwidth, making it nearly impossible to integrate additional functions on the same chip, as other functions also require DRAM bandwidth.”

While there is a notion in certain circles that a sizable infotainment chip should accommodate various workloads alongside LLMs, Wang clarified that this is presently unfeasible due to the performance and bandwidth demands imposed by LLMs, necessitating a separate accelerator. Kohn added, “Depending on the model’s size, we may witness smaller models being applied in domains like robotics, where they won’t need to handle the extensive tasks that larger models do. Nonetheless, the demand for enhanced performance persists, and we anticipate the emergence of optimized solutions catering to a range of price-performance requirements.”

Beyond the edge, Ambarella is contemplating the potential utilization of the CV3 family in data center environments. The CV3 family boasts multiple PCIe interfaces, which are potentially beneficial in multi-chip systems. Wang pondered, “For us, the question is whether we can position our current and future products effectively in hyperscale or cloud-based solutions. This remains an unanswered question, but we have confirmed the viability of our technology, and we possess differentiating features. We are actively formulating a strategy for this endeavor and hope to justify further investments if we decide to venture into cloud-based solutions.”

Conclusion:

Ambarella’s innovative strides in integrating LLM capabilities into its cutting-edge chips not only offer significant advantages for the automotive and security sectors but also signal the company’s commitment to embracing the evolving landscape of artificial intelligence and edge computing. As the demand for LLMs continues to surge, Ambarella remains at the forefront of delivering advanced solutions to meet the complex requirements of modern technology applications.

Source