TL;DR:
- Harvard dropouts secure $5.36 million in seed funding for their chip startup, Etched.ai.
- Etched.ai aims to develop a specialized AI accelerator chip for large language model (LLM) acceleration.
- The company plans to compete with incumbents by specializing further and designing a more efficient inference architecture for LLMs.
- The market for running transformers is rapidly evolving, presenting a significant opportunity for Etched.ai.
- The founders emphasize the importance of specialization and predict that even industry leader Nvidia will follow suit.
- Etched.ai’s first chip, codenamed Sohu, promises 140× the throughput per dollar compared to Nvidia’s H100 GPU.
- The company is focusing on reducing complexity and attracting customers who want cost-effective utilization of ChatGPT.
- Seed funding will be used for team expansion, RTL front-end development, and engagement with IP providers.
- Etched.ai plans to pursue a Series A funding round in the near future.
Main AI News:
In a groundbreaking move, two Harvard dropouts, both 21 years old, have successfully raised a staggering $5.36 million in a seed round for their cutting-edge chip startup, Etched.ai. The startup aims to develop an advanced AI accelerator chip specifically designed for large language model (LLM) acceleration. The ambitious entrepreneurs revealed this exciting news to the esteemed publication, EE Times. Primary Venture Partners led the investment round, with additional participation from MAX Ventures, as well as prominent angels, including former CEO of eBay, Devin Wenig. With this substantial infusion of capital, Etched.ai’s valuation has skyrocketed to an impressive $34 million.
Gavin Uberti, the CEO of Etched.ai, disclosed to EE Times that he initially intended to take a year off from Harvard. However, fate had other plans, as he found himself drawn to a position working on the ApacheTVM open-source compiler and microkernels at OctoML. During his tenure, Uberti delved into the development of microkernels for Arm Cortex M4 and Cortex M7 cores. It was during this process that he made a critical observation about Arm’s instruction set – the absence of 8-bit MAC SIMD (Single Instruction, Multiple Data) instructions. Unlike its 16-bit counterpart, this instruction limitation resulted in a significant decrease in the efficiency of 8-bit MAC SIMD operations, which ran at only half the speed.
“The perpetual need to grapple with this oversight at work prompted me, along with Chris [Zhu, Etched.ai’s CTO], to contemplate the possibility of achieving better results,” explained Uberti. “Concurrently, we witnessed a transformative shift in the landscape of language models.”
Uberti’s reference to the recent surge in interest surrounding LLMs, such as the revolutionary ChatGPT, which operate based on the transformer architecture, was highly insightful. This newfound understanding inspired Uberti and Zhu to embark on a remarkable journey of establishing their own chip company. Their primary objective? To design an inference architecture that would significantly enhance the efficiency of LLMs. Although no LLM-specific accelerators currently exist on the market, tech giant Nvidia has announced software features targeting transformers, and various accelerator companies have declared their support for language and vision transformers. Etched.ai, however, intends to differentiate itself from existing players by further specializing in this domain.
Uberti emphasized the crucial role specialization plays in achieving the extraordinary advancements they are pursuing. “We firmly believe that the substantial improvements we are striving for can only be accomplished through a focused approach, not just in the realm of AI but also in a more specific context,” he stated. “We predict that Nvidia will eventually move in this direction. The opportunity is simply too colossal to overlook.”
Drawing a parallel, Uberti cited the triumph of specialized ASICs (Application-Specific Integrated Circuits) employed in Bitcoin mining as an exemplary case. Within the AI accelerator sector, numerous companies have already introduced specialized architectures tailored for specific workloads. While some have concentrated on edge-based architectures optimized for convolutional neural networks (CNNs), such as Kneron, others have directed their efforts towards data center-oriented architectures primarily addressing deep learning recommendation models (DLRMs), a notorious challenge for GPUs to accelerate (evident in the case of Neuchips). In contrast, Nvidia has already integrated a comprehensive software feature called the Transformer Engine into its current H100 GPU, enabling LLM inference without the need for further quantization.
Furthermore, hyperscalers have displayed a burgeoning appetite for developing their own dedicated chips catering to their unique workloads. Meta recently unveiled its self-built DLRM inference chip, which has already witnessed extensive adoption. Meanwhile, Google’s TPU and AWS’s Inferentia cater to a broader range of general workloads.
Zhu urged caution when drawing comparisons between the demands of recommendation workloads and the transformative potential of LLMs. He emphasized the relatively nascent stage of the market for running transformers, which barely existed six months ago. Conversely, recommendation models have undergone a more extended period of maturation. Zhu remarked, “The world has undergone a rapid transformation, and this very evolution presents our unprecedented opportunity.”
However, Etched.ai faces a potential pitfall: the rapid evolution of workloads within the AI domain. Uberti acknowledged this risk but remained undeterred. “Indeed, this is a genuine concern that has dissuaded many others from venturing down this path. However, the fundamental nature of transformers remains unaltered,” Uberti assured. Drawing a comparison between GPT-2, a language model from four years ago, and Meta’s recent Llama model, Uberti highlighted that the differences primarily revolve around size and activation functions. While the training process may differ, these variances are inconsequential when it comes to inference.
Uberti’s optimism stems from the immutable nature of the basic components that constitute transformers. Although nuanced enhancements may arise, Uberti expressed confidence, stating, “Innovations are not conjured out of thin air. Rather, there exists a cyclical pattern where academia publishes research, which then undergoes a process of integration over time.”
He cited gated linear activation units, introduced in 2018, which only found their way into Google’s Palm model in 2020, as well as the method for positional encoding known as Alibi, which gained widespread adoption by the end of 2022, despite being introduced in 2021. Uberti estimated that a typical startup requires 18 to 24 months to develop a chip from scratch.
“The edge industry has provided us with invaluable insights. The key lesson they have learned is to avoid over-specialization, as one cannot predict the future. By placing their bets in the wrong area, they risk rendering their efforts obsolete,” Uberti shared. “We have chosen to discard this advice and chart our own course.”
The duo has been meticulously refining their ideas for their inaugural chip, codenamed Sohu. They assert that this groundbreaking chip, which boasts a remarkable capacity for memory, can achieve a mind-boggling 140 times the throughput per dollar compared to Nvidia H100 PCIe card processing GPT-3 tokens. Uberti hinted that this exceptional performance metric primarily stems from the chip’s impressive throughput capabilities, rather than a drastic cost differential. The design of Sohu prioritizes support for large batch sizes, effectively streamlining operations.
Uberti revealed that a significant portion of the architectural details for Sohu has already been outlined. The design’s tiled nature is expected to expedite the development process while minimizing complexity. By focusing solely on one model type, the software stack’s intricacy, particularly the compiler, can be significantly reduced.
Etched.ai aims to cater to any entity seeking more cost-effective utilization of ChatGPT. Beyond that, the company is diligently working on finalizing its business model. Although Uberti confirmed that the company has already secured commitments from customers, he opted to withhold further specifics.
The seed funding secured by Etched.ai will be judiciously allocated towards recruiting an initial team, initiating RTL (register-transfer level) front-end development, and commencing discussions with IP providers. Notably, the company has already appointed Mark Ross, a former Cypress CTO from the early 2000s, as its chief architect.
The company envisions pursuing a Series A funding round, tentatively scheduled for the beginning of the following year.
“Most investors adopt a naturally skeptical stance, and rightfully so, as they perceive a pair of undergraduates endeavoring to revolutionize the semiconductor industry,” Zhu remarked. “Nonetheless, there exists a considerable contingent of investors who are profoundly impressed and enthralled by the vision we present and the limitless potential we envision.”
Conclusion:
Etched.ai’s successful fundraising efforts and its pursuit of a specialized LLM accelerator chip signal a significant development in the AI chip market. The growing interest in large language models and the need for efficient inference architectures present a valuable opportunity for Etched.ai to establish itself as a key player. By emphasizing specialization and leveraging its innovative chip design, the company aims to compete with established players and cater to the increasing demand for cost-effective AI utilization. With their strategic approach and strong backing, Etched.ai is poised to make a substantial impact in the evolving landscape of AI chip development.