TL;DR:
- Two Harvard dropouts, founders of Etched.ai, secure $5.36 million in seed funding for their chip startup.
- Etched.ai plans to develop an AI accelerator chip dedicated to large language model (LLM) acceleration.
- The company aims to specialize in LLM inference, competing with incumbents by focusing on a single architecture.
- The founders highlight the recent surge in interest in LLMs and the need for more efficient inference architectures.
- Sohu, the codename for Etched.ai’s upcoming chip, is expected to deliver 140× throughput per dollar compared to Nvidia’s H100 GPU.
- The funding will be used for team expansion, RTL front-end development, and engagement with IP providers.
- Etched.ai’s business model is evolving, with committed customer investments already in place.
- The company plans to pursue a Series A funding round in the near future, while investors’ responses have been mixed.
- The Sohu chip is targeted for availability in 2024, promising to revolutionize AI acceleration and LLM inference.
Main AI News:
Etched.ai, a chip startup founded by two enterprising 21-year-old Harvard dropouts, has successfully raised $5.36 million in a seed round to support their ambitious venture. The company aims to develop an AI accelerator chip specifically designed for large language model (LLM) acceleration. The announcement was made by the determined duo in an exclusive interview with EE Times. Primary Venture Partners, along with MAX Ventures, and notable angel investors such as former CEO of eBay, Devin Wenig, led the funding round. As a result, Etched.ai’s valuation now stands at an impressive $34 million.
CEO of Etched.ai, Gavin Uberti, revealed to EE Times that his initial plan was to take a year-long break from Harvard. However, he ended up finding employment at OctoML, where he contributed to the development of the ApacheTVM open-source compiler and microkernels. While engaged in this work, Uberti noticed a crucial limitation in Arm Cortex M4 and Cortex M7 cores—the absence of an 8-bit Multiply-Accumulate (MAC) SIMD instruction. Although these cores supported various other 8-bit SIMD operations, the lack of an 8-bit MAC SIMD instruction resulted in suboptimal performance for such operations.
Uberti expressed his frustration, stating, “It could never be fixed, and every time I’d go to work, I’d have to deal with this [oversight], and it made me think with Chris [Zhu, Etched.ai CTO] that we have to be able to do this better. At the same time, we saw there was a change happening in the world of language models.”
Uberti’s reference to the evolving landscape of LLMs, such as ChatGPT, which is based on transformer architecture, prompted him and Zhu to establish a chip company with a vision to design a more efficient inference architecture for LLMs. While no LLM-specific accelerator exists in the current market, Nvidia has announced software features targeting transformers, and other accelerator companies have pledged support for language and vision transformers. To differentiate itself, Etched.ai plans to specialize further.
Uberti emphasized, “You can’t achieve the kind of improvements we’re witnessing through a generalized approach. Instead, you have to wholeheartedly invest in a single architecture—not just for AI, but for something more specific. We believe that, eventually, Nvidia will pursue this specialization. We consider the opportunity too significant to overlook.”
Drawing a parallel, Uberti pointed out the success of specialized ASICs used for Bitcoin mining. In the AI accelerator domain, several companies have devised specialized architectures catering to specific workloads. For instance, Kneron has focused on CNN-centric architectures at the edge, while architectures tailored for the data center have primarily centered around DLRM (deep learning recommendation model), a workload challenging to accelerate with GPUs, as exemplified by Neuchips. In contrast, Nvidia has already introduced the Transformer Engine, a comprehensive software feature in its existing H100 GPU, enabling LLM inference without requiring additional quantization.
Additionally, hyperscalers’ inclination toward developing customized chips for their specific workloads poses a challenge. Meta recently announced its self-built DLRM inference chip, which has already achieved widespread adoption. Google’s TPU and AWS’ Inferentia cater to more general workloads.
Zhu cautioned that any comparison with recommendation workloads should consider the timescales involved, pointing out that recommendation models have reached a relatively mature stage, unlike transformers. He remarked, “This is a very recent development—the market for running transformers didn’t really exist six months ago, whereas DLRM, on the other hand, has had a comparatively longer presence. The world has changed very rapidly, and that is our opportunity.“
However, Uberti acknowledged the potential risks associated with over-specialization in the rapidly evolving AI landscape. Despite these concerns, he remained optimistic, stating, “That’s a real risk, and I think it’s turning off a lot of other people from going down this route, but transformers aren’t changing. If you look back four years to GPT-2, compared to Meta’s recent Llama model, there are just two differences—the size and the activation function. There are differences in how it is trained, but that doesn’t matter for inference.“
While Uberti acknowledged the fixed nature of the basic components of transformers, he also highlighted the time it takes for academic advancements to be integrated into practical applications. He cited examples such as gated linear activation units, which were introduced in 2018 but only found their way into Google’s Palm model in 2020, and the Alibi method for positional encoding, which gained widespread adoption toward the end of 2022. Uberti estimated that a typical startup would require 18-24 months to develop a chip from scratch.
Uberti further added, “The edge industry offers us valuable insights—the one lesson they’ve learned is not to specialize. You don’t know what the future holds, and if you place your bets in the wrong place, you could become irrelevant. We disregarded that advice and pursued our own path.”
Sohu Chip: Pioneering Enhanced Performance
Uberti and Zhu have been actively exploring concepts for their inaugural chip, codenamed Sohu. They confidently claim that Sohu can deliver an exceptional 140× throughput per dollar compared to an Nvidia H100 PCIe card processing GPT-3 tokens.
Uberti elaborated, stating that Sohu would be characterized by abundant memory capacity, and the impressive performance metric primarily stems from its remarkable throughput rather than a drastic cost difference. The chip has been designed to support large batch sizes, enhancing its capabilities. The founders also mentioned that the architecture for Sohu is largely fleshed out, featuring a tiled design that expedites development while minimizing complexity. The chip’s exclusive focus on one model type further simplifies the software stack, particularly the compiler.
Etched.ai’s customer base encompasses anyone seeking cost-effective utilization of ChatGPT. While the company continues to refine its business model, Uberti confirmed that it already had committed customer investments, although specific details were not disclosed.
The seed funding will primarily be allocated to building an initial team, commencing RTL front-end development, and initiating discussions with IP providers. Notably, Etched.ai has already secured the services of Mark Ross as the chief architect. Ross brings valuable expertise, having previously served as the CTO of Cypress in the early 2000s.
Looking ahead, Etched.ai aims to pursue a Series A funding round, anticipated to commence early next year. While many investors approach the endeavor with skepticism, given the youthful status of the founders, Zhu noted that a significant portion of investors remains impressed and excited by their visionary pitch and the potential impact they can achieve.
Etched.ai is steadfastly committed to making the Sohu chip available to the market by 2024, revolutionizing the realm of AI acceleration and LLM inference.
Conclusion:
Etched.ai’s successful seed round funding and its focus on developing a specialized LLM accelerator chip, Sohu, signify a growing recognition of the demand for more efficient inference architectures. By betting on a single architecture and targeting the evolving landscape of language models, Etched.ai aims to differentiate itself from existing players. The impressive performance metrics projected for Sohu, along with committed customer investments, demonstrate promising market potential.
However, the skepticism from some investors highlights the challenges associated with youthful founders venturing into the competitive semiconductor industry. Nevertheless, if Etched.ai can deliver on its promises, its Sohu chip could disrupt the market by providing cost-effective and high-performance solutions for LLM acceleration.