BigTranslate: Redefining Multilingual Translation with a Revolutionary Language Model


  • A team of Chinese NLP researchers introduces BigTranslate, a powerful large language model for multilingual translation.
  • BigTranslate enhances translation capabilities across over 100 languages, surpassing ChatGPT in nine language pairs.
  • The model achieves impressive similarity to Google Translate and demonstrates proficiency in low-resource languages like Tibetan and Mongolian.
  • BigTranslate’s release signifies a growing trend of open-sourced language models, with thousands of models being shared on platforms like GitHub.
  • The translation landscape is evolving rapidly, with language models like BigTranslate transforming global communication and offering new possibilities for businesses.

Main AI News:

In a groundbreaking development, a team of esteemed NLP researchers from China recently unveiled BigTranslate, a cutting-edge large language model (LLM) that revolutionizes multilingual translation capabilities across a staggering 100 languages. This groundbreaking innovation, which has been made available on GitHub, promises to redefine the way businesses communicate on a global scale.

The visionary minds behind this remarkable achievement hail from prestigious institutions such as the Institute of Automation Chinese Academy of Sciences, the School of Artificial Intelligence at the University of the Chinese Academy of Sciences, and Wuhan AI Research. Leveraging the foundations laid by LLaMA, another remarkable LLM introduced by Meta AI in February 2023, the researchers have created BigTranslate as a unified solution capable of handling low-resource languages with unparalleled precision.

The secret to BigTranslate’s exceptional performance lies in its meticulous training approach, with a specific focus on Chinese and an extensive parallel dataset encompassing a remarkable 102 languages. By honing its expertise in Chinese, an area in which LLaMA previously faced challenges in understanding and generating accurate translations, BigTranslate ensures a well-balanced competency across both high-resource and low-resource languages.

To enhance BigTranslate’s multilingual proficiency, the team dedicated substantial efforts to constructing a comprehensive parallel corpus dataset comprising all 102 languages. This vast collection was meticulously curated from an array of public and proprietary sources, ensuring a solid linguistic foundation. Demonstrating their commitment to addressing potential language pair imbalances, the researchers meticulously employed a data augmentation strategy. This strategic move facilitated greater inclusion of underrepresented language directions, resulting in a harmonious and comprehensive corpus that drives more accurate translations.

In order to gauge the true effectiveness of BigTranslate, extensive multilingual translation experiments were conducted across all 102 supported languages. In a head-to-head comparison with renowned translation systems like Google Translate and ChatGPT, BigTranslate consistently outperformed its competitors, achieving higher BLEU scores—a universally recognized measure of translation quality—in an impressive nine language pairs.

Not content with this remarkable feat, the researchers took their assessment to the next level by conducting automatic evaluations with GPT-4. This evaluation focused on assessing the semantic similarity and style consistency between the source text and the translated output. The results were nothing short of extraordinary: BigTranslate demonstrated a remarkable level of similarity to Google Translate in numerous language pairs, solidifying its position as a leading contender in the global translation landscape.

Due to its remarkable ability to flawlessly translate languages like Tibetan and Mongolian, BigTranslate is poised to make a significant impact in the domestic Chinese market. Its unrivaled capabilities open up exciting possibilities for businesses operating in this vibrant economy, facilitating seamless communication and expanding global outreach.

BigTranslate’s unveiling is just one of many groundbreaking advancements occurring in the field of language models. Platforms like GitHub and Hugging Face have become hotbeds for innovation, with researchers and developers enthusiastically sharing over 200,000 models to date. What’s more, the pace of progress shows no sign of slowing down, with approximately 5,000 new models being added every week.

Clement Delangue, the CEO of Hugging Face, emphasized the pivotal role that these models play in addressing the translation needs of low-resource languages during his testimony before the US Congress in late June 2023. The democratization of knowledge and the widespread availability of these models empower businesses and individuals alike to bridge linguistic barriers and engage with diverse markets.

However, the unveiling of BigTranslate and its triumphs in translating Tibetan and Mongolian languages may only be the beginning. As this article goes to press, the Alibaba Group, a prominent Chinese conglomerate, has introduced POLYLM—a powerful model designed to transfer general knowledge to low-resource languages while maintaining a high level of proficiency in high-resource languages. With numerous other models on the horizon, the future of translation is undeniably bright, promising a world where language is no longer a barrier to global business success.


The introduction of BigTranslate and its remarkable performance in multilingual translation represents a significant milestone in the market. The model’s ability to accurately handle low-resource languages, outperform ChatGPT, and closely match Google Translate sets a new standard for translation quality. As more language models like BigTranslate become available, businesses can leverage these powerful tools to communicate effectively across diverse markets, break language barriers, and expand their global reach. The translation industry is witnessing a transformative shift, empowering organizations to embrace global opportunities like never before.
