Tower: A Multilingual 7B Parameter LLM for Translation Tasks

TL;DR:

  • Tower, a multilingual LLM with 7B parameters, is designed for translation-related tasks.
  • It supports 10 languages, including English, German, French, Spanish, Chinese, Portuguese, Italian, Russian, Korean, and Dutch.
  • Tower excels in pre-translation activities, grammar improvement, and translation assessment.
  • It outperforms state-of-the-art counterparts, including ALMA 13B and LLaMA-2 70B.
  • Tower’s development involved extended pre-training with a dataset of 20 billion tokens and instruction tuning using the TowerBlocks dataset.

Main AI News:

In the realm of language models, we’ve witnessed a transformative evolution, thanks to the proliferation of expansive language models such as GPT-3.5, LLaMA, and Mixtral. These innovations that emerged last year have greatly empowered us to tackle a multitude of language-related challenges. However, amid this burgeoning landscape, a conspicuous void has persisted – the absence of dependable open-source models tailored specifically for translation tasks. A diligent pursuit of solutions was embarked upon, and the results are nothing short of remarkable.

Enter Tower, a collaborative endeavor uniting the research prowess of Unbabel, the SARDINE Lab at Instituto Superior Técnico, and the researchers at the MICS lab, CentraleSupélec, University of Paris-Saclay. Tower, a multilingual marvel based on the Llama 2 architecture, boasts an impressive 7B parameters meticulously optimized for translation-centric endeavors. What sets this model apart from its peers is its unparalleled support for a vast array of languages. Unlike many other open-source models that predominantly rely on English data, Tower stands tall with support for 10 languages. This remarkable repertoire includes English, German, French, Spanish, Chinese, Portuguese, Italian, Russian, Korean, and Dutch.

But Tower doesn’t stop at mere translation; it’s a versatile powerhouse equipped for various pre-translation activities. From enhancing grammar to tackling the intricate realm of translation assessment, including machine translation and automatic post-editing, Tower is your one-stop solution. The collaborative research team behind Tower has conducted extensive evaluations, and the results speak volumes. Tower consistently outperforms state-of-the-art counterparts in translation tasks and surpasses alternative open-source solutions, notably ALMA 13B and LLaMA-2 70B.

The journey to creating Tower unfolded in two distinct stages: extended pre-training and instruction tuning. The decision to employ continued pre-training was driven by the goal of elevating LLaMA2’s proficiency in non-English languages. Meanwhile, instruction tuning took the helm in refining the model’s ability to address specific challenges without the need for prior experience. For continued pre-training, an extensive dataset comprising a staggering 20 billion tokens, evenly distributed across various languages, was harnessed. A significant portion of these tokens was sourced from monolingual data, while the remainder was derived from publicly accessible bilingual datasets, such as OPUS.

The second phase of instruction tuning was a pivotal step in enhancing the model’s capacity to tackle diverse tasks with finesse, even in a zero-shot scenario. This phase introduced the TowerBlocks dataset, meticulously crafted for supervised fine-tuning. TowerBlocks includes an array of code instructions and conversational data, replete with task-specific records. This invaluable resource played a pivotal role in enabling the model to maintain its competency across a myriad of translation-related tasks, offering prompts for both zero and few-shot templates. Tower, with its impressive capabilities and multilingual prowess, has now firmly established itself as a force to be reckoned with in the domain of translation-related language models.

Conclusion:

The emergence of Tower, a powerful multilingual 7B parameter LLM optimized for translation tasks, represents a significant leap in the language model landscape. With support for 10 languages and superior performance in translation, Tower is poised to reshape the translation market. Its versatile capabilities for pre-translation activities and translation assessment make it a valuable tool for businesses operating in multilingual environments, enhancing efficiency and accuracy in language-related tasks. Tower’s development process, including extended pre-training and instruction tuning, showcases a commitment to excellence in addressing language challenges. Businesses seeking robust language solutions should closely consider integrating Tower into their operations to gain a competitive edge in a globalized world.

Source