Introducing ChatGLM2-6B: The Evolution of Open-Source Bilingual Chat Models

TL;DR:

  • Tsinghua University’s ChatGLM2-6B is the second-generation version of the ChatGLM series, a powerful open-source bilingual chat model.
  • ChatGLM2-6B builds upon the success of ChatGLM-6B, incorporating new features and enhancements.
  • It offers improved performance, longer context support (32K), and more efficient inference.
  • The model has been pre-trained on over 1.4 trillion English and Chinese tokens, showcasing noticeable improvements on various datasets.
  • ChatGLM2-6B can be deployed locally and is resource-efficient, making it highly accessible.
  • The model weights are now available for commercial use.
  • Tsinghua University researchers have open-sourced ChatGLM2-6B to encourage growth and innovation in the field of language models.

Main AI News:

In the realm of natural language conversation agents, the groundbreaking success of OpenAI’s ChatGPT has paved the way for significant advancements. Researchers worldwide are actively exploring innovative techniques and strategies to enhance chatbot models, fostering more natural and captivating interactions with users. One notable alternative to ChatGPT that has emerged is the ChatGLM model series, developed by esteemed researchers at Tsinghua University in China. Distinct from the well-known Generative Pre-trained Transformer (GPT) group of LLMs, the ChatGLM series, based on the General Language Model (GLM) framework, has gained traction in the market. Among the various bilingual models within this series, ChatGLM-6B stands out, boasting an impressive parameter count of 6.2 billion. Through extensive pre-training on over 1 trillion English and Chinese tokens, and subsequent fine-tuning for Chinese question-answering, summarization, and conversational tasks utilizing reinforcement learning with human feedback, ChatGLM-6B has become a formidable contender.

One distinguishing feature of ChatGLM-6B is its ability to be deployed locally, demanding minimal resources due to its efficient quantization techniques. This model can even run on consumer-grade graphics cards, catering to a wider audience. Its popularity has skyrocketed, particularly in China, where it has amassed over 2 million downloads globally. Evidently, ChatGLM-6B has established itself as one of the most influential large-scale open-source models available. Recognizing its impact, the researchers at Tsinghua University have introduced ChatGLM2-6B, the second-generation iteration of this bilingual chat model. Building upon the strengths of its predecessor, ChatGLM2-6B incorporates a host of new features, including performance enhancements, support for longer contexts, and more efficient inference. Additionally, the model weights, once exclusively utilized for academic purposes, are now accessible for commercial use.

To elevate the baseline performance, the researchers have enhanced the ChatGLM2-6B base model compared to its first-generation counterpart. Leveraging the hybrid objective function of GLM, the researchers embarked on extensive pre-training, utilizing an extensive corpus of over 1.4 trillion English and Chinese tokens. Comparative evaluations against models of similar scale in the market revealed ChatGLM2-6B’s superior performance across various datasets, such as MMLU, CEval, BBH, and more.

Notably, ChatGLM2-6B introduces support for significantly longer contexts, expanding from 2K in the previous version to an impressive 32K. This achievement was made possible by the implementation of the FlashAttention algorithm, accelerating attention mechanisms and reducing memory consumption for even lengthier sequences within the attention layer. Moreover, during dialogue alignment, the model was trained with a context length of 8K, affording users a greater depth of conversation. Through the application of the Multi-Query Attention technique, ChatGLM2-6B achieves remarkable GPU memory efficiency in the KV Cache and boosts inference speed by approximately 42% compared to its predecessor.

In a bid to foster growth and innovation in LLMs, the researchers at Tsinghua University have generously open-sourced ChatGLM2-6B. Their intent is to encourage developers and researchers worldwide to capitalize on the model’s potential, facilitating the development of useful applications. However, the researchers also acknowledge that, given the model’s relatively smaller scale, its outputs may be influenced by randomness, necessitating careful fact-checking for accuracy. Looking ahead, the team has already set its sights on the future, with work underway on ChatGLM3, the third iteration of this remarkable model. With each iteration, the boundaries of what is possible in natural language conversation agents continue to expand, promising an exciting future for this field of research and application.

Conclusion:

The release of ChatGLM2-6B signifies a significant advancement in the market of bilingual chat models. Its performance improvements, longer context support, and resource efficiency make it an appealing choice for developers and researchers. The availability of model weights for commercial use further expands its potential applications. Tsinghua University’s commitment to open-source development fosters collaboration and innovation in the broader language model community, setting the stage for future advancements such as ChatGLM3.

Source