MosaicML Unveils MPT-30B: A Powerful 30B Model Challenging LLaMA, Falcon, and GPT

TL;DR:

  • MosaicML launches MPT-30B, a 30-billion parameter language model surpassing GPT-3 in quality.
  • MPT-30B handles longer sequences and offers increased flexibility for data-heavy enterprise applications.
  • MosaicML utilizes “FlashAttention” for faster inference and training, distinguishing it from Falcon and LLaMA.
  • The model is optimized for real hardware constraints, making it easier to deploy on GPUs.
  • MPT-30B demonstrates favorable performance compared to LLaMA and Falcon, particularly in coding-related tasks.
  • OpenAI’s GPT-4 boasts superior capabilities but is costlier to operate, while MosaicML focuses on specific industry data.
  • MosaicML offers developers an API integration option, customization through fine-tuning, and tools for pre-training custom models.
  • The platform is compatible with third-party tools like LangChain, enabling enhanced customization and ownership.
  • Open source LLMs like MosaicML are empowering enterprise developers and providing customizable solutions.

Main AI News:

In a strategic move, MosaicML is launching its groundbreaking large language model (LLM), MPT-30B, as a successor to the previously introduced MPT-7B model in May. To delve into the significance of this new release for developers, we sat down with MosaicML’s co-founder and CEO, Naveen Rao, a seasoned AI industry veteran with a track record that includes the acquisition of his former deep learning company, Nervana, by Intel in 2016.

As the name suggests, MPT-30B boasts an impressive 30 billion parameters. Despite having approximately one-sixth of the parameters compared to OpenAI’s GPT-3, MosaicML claims that their model surpasses GPT-3 in terms of quality. The company highlights that this distinction enables MPT-30B to operate seamlessly on local hardware, offering a more cost-effective solution for deployment and inference.

Distinguishing MosaicML from its competitors LLaMA and Falcon, MPT-30B was trained on longer sequences, accommodating up to 8,000 tokens compared to the 2,000 tokens handled by GPT-3, LLaMA, and Falcon. This extended sequence capability positions MPT-30B as an ideal choice for data-intensive enterprise applications that require the handling of lengthier prompts. Notably, MosaicML’s previous 7B parameter model, MPT-7B-StoryWriter-65k+, features an astounding “context length” of 65,000, providing users with even greater flexibility for crafting prose and generating longer outputs.

One of the notable differences emphasized by Rao is MosaicML’s employment of an attention mechanism known as “FlashAttention.” This mechanism, described in a 2022 academic paper, offers accelerated inference and training capabilities, setting MosaicML apart from Falcon and LLaMA. These efficiency gains translate into more cost-effective computing, making MosaicML an optimal choice for enterprise applications. Moreover, MosaicML ensures that their 30 billion parameter model is meticulously designed to optimize performance within the constraints of real hardware, particularly when it comes to deep-learning GPUs utilizing 40-80 gigabytes of memory. In contrast, the Falcon LLM, despite its 40 billion parameters, faces challenges in accommodating such memory constraints, whereas MPT-30B conveniently fits within an 80-gig GPU.

Rao asserts that MosaicML’s 30B parameter model compares favorably to both LLaMA and Falcon in terms of performance, despite being trained on fewer compute resources due to their efficient methods. While the superiority of each model depends on the evaluation metric, MosaicML’s model demonstrates significant improvements in coding-related tasks. Nevertheless, independently validating these claims against Stanford’s HELM measure proves challenging, as none of the three open source LLM projects mentioned—MosaicML, LLaMA, or Falcon—have undergone such testing.

In the face-off between MosaicML and OpenAI’s GPT-4, Rao acknowledges GPT-4’s overall superior capabilities. However, he underscores the distinctive advantage of MosaicML’s model—its longer context length. This feature enables unique use cases, such as generating an epilogue for F. Scott Fitzgerald’s renowned novel, “The Great Gatsby.” Rao candidly admits that as a former English Literature major, he personally finds such applications of LLMs less appealing. Nonetheless, the main hurdle posed by large models like GPT-4 lies in their exorbitant operational costs, rendering them impractical for most enterprises. MosaicML, on the other hand, specializes in catering to companies with specific data needs, including sensitive information, enabling the fine-tuning of models for specific industries.

Highlighting the value of MosaicML’s offerings across various sectors, Rao illustrates how industries like healthcare and banking can leverage the platform’s ability to interpret and summarize extensive data sets. In healthcare, for instance, the model can analyze lab results and provide insights into a patient’s history, empowering medical professionals with comprehensive information derived from diverse inputs. Rao stresses that open source models, like those provided by MosaicML, are particularly valuable in scenarios involving sensitive data, necessitating secure handling within a firewall rather than relying on API-based approaches offered by platforms like OpenAI.

For developers interested in leveraging MosaicML’s platform, Rao outlines different options tailored to their needs and expertise. The company provides a user-friendly API, akin to offerings from other providers like OpenAI, enabling seamless integration of MosaicML’s models into front-end applications. Remarkably, MosaicML’s models exhibit enhanced cost-effectiveness when compared to similar-sized models available from other providers. Alternatively, developers can opt to customize MosaicML models by fine-tuning them with their own data. The downloadable model can be modified to create a personalized API based on the customized version. Advanced developers with substantial data reserves can leverage MosaicML’s tools to pre-train custom models from scratch and serve them via the MosaicML platform.

Addressing compatibility concerns, Rao confirms that MosaicML’s platform works seamlessly with popular third-party tools, including LangChain. In fact, developers can utilize these tools on top of a custom model built with MosaicML, offering unparalleled customization and ownership over the entire model. With MosaicML, developers have the freedom to tailor their models to specific requirements, a level of customization that is absent in API providers like OpenAI.

Despite the playful banter surrounding LLaMA and Falcon during the discussion, Rao ultimately views these open source LLMs as part of the same team, with proprietary platforms such as OpenAI representing their true competition. In Rao’s opinion, open source LLMs restore power to enterprise developers by providing them with comprehensive solutions, while centralized platforms lack the flexibility and customization options that developers require. Although Rao acknowledges that open LLMs may not have fully bridged the gap with closed source models, he firmly believes that they have reached a threshold of usefulness that cannot be ignored.

Conclusion:

The launch of MosaicML’s MPT-30B presents a significant development in the market. With its impressive parameter count and superior quality, MPT-30B competes fiercely with established models such as GPT-3. The model’s ability to handle longer sequences and its efficient attention mechanism gives it an edge over its competitors, Falcon and LLaMA. MosaicML’s emphasis on optimization for real hardware constraints, cost-effectiveness, and industry-specific data further solidifies its position. This signals a shift in power to enterprise developers, offering them customizable solutions and challenging the dominance of proprietary platforms like OpenAI. The market can expect increased competition, driving innovation and further advancements in the field of large language models.

Source