Researchers at University of Maryland and Heidelberg University improve AI translation through light-weight feedback

  • Researchers at the University of Maryland and Heidelberg University explored the impact of light-weight feedback on AI translation.
  • Feedback types included generic, score-based, and fine-grained approaches, with fine-grained feedback proving most effective.
  • The study introduced a two-step process using human error markings to guide AI models in correcting translations.
  • Results showed significant improvement in translation quality over traditional methods like translation from scratch and automatic post-editing.
  • LLMs responded positively to error markings, demonstrating a 68% accuracy rate in corrections based on human evaluation.

Main AI News:

Researchers have enhanced AI translation by integrating light-weight feedback from translators. In April 2024, Dayeon Ki and Marine Carpuat from the University of Maryland illustrated that providing external feedback to large language models (LLMs) improves their capabilities in machine translation post-editing (MTPE). Their study explored various forms of feedback, including generic, score-based, and fine-grained approaches, with the latter showing the most promise.

Building upon this, Nathaniel Berger and Stefan Riezler from Heidelberg University, alongside Miriam Exel and Matthias Huck from SAP SE, advocated for the effectiveness of “light-weight” feedback in guiding LLMs to self-correct translations, particularly in technical domains where LLMs may lag behind general domains. Their proposed method involves a two-step process utilizing human feedback (error markings) to enhance LLM capabilities.

During the initial step, translators identify and mark errors in machine-generated translations using bold-faced tags <bad></bad>. These error-marked segments prompt LLMs to focus on correcting errors by referencing similar examples from a post-editing translation memory (PE-TM). The PE-TM includes source segments, machine translations, and reference translations enriched with human error markings. By learning from these instances, LLMs can improve the quality of their translations.

To validate their approach, a pilot study was conducted in the IT domain using the English-German language pair. Data from open-source software documentation, annotated by professional translators, were used to create the PE-TM. Llama 13B and GPT-3.5 were employed for generating and correcting translations across three tasks: machine translation from scratch, automatic post-editing, and post-editing with error markings.

The study demonstrated that providing error markings significantly enhances LLMs’ ability to correct translations, surpassing both translation from scratch and automatic post-editing approaches. “Overall translation quality is improved over few-shot prompt-based translation and over automatic post-editing,” the researchers affirmed.

Moreover, the researchers observed that while LLMs typically consider their own translations as correct and do not self-correct, they learn to act on error markings. Human evaluation indicated that 68% of the corrections made based on error markings were accurate, compared to only 32% during automatic post-editing.

Conclusion:

Integrating light-weight feedback into AI translation processes, as demonstrated by recent studies, represents a significant advancement. By leveraging human feedback to enhance the self-correction capabilities of large language models (LLMs), the industry can expect improved accuracy and efficiency in technical translations. This approach not only outperforms traditional methods like translation from scratch but also underscores the growing importance of human-AI collaboration in refining language processing technologies.

Source