Skip to main content

LLM-Based Translation Agent with Integrated Translation Memory

Primary supervisor

Trang Vu

Large language models (LLMs) have recently made significant progress in machine translation quality [1], but they still struggle with maintaining consistency and accuracy across entire documents. Professional translators commonly use translation memory (TM) tools to reuse past translations, ensuring consistent terminology and phrasing throughout a document. Inspired by the latest research, such as DelTA [2], a document-level translation agent with a multi-level memory architecture, and HiMATE [3], a multi-agent evaluation framework leveraging fine-grained MQM error typology, this project seeks to bridge the gap between LLMs and traditional TM systems. The goal is to enhance domain-specific accuracy and document-level coherence in LLM-based translation by intelligently incorporating a translation memory mechanism.

 

Student cohort

Double Semester

Aim/outline

  • An open source code 
  • A publication in NLP venues such as ACL/EMNLP/NAACL/EACL

Required knowledge

  • Must: fluency in Python and PyTorch
  • Must: academic or working knowledge of Large Language Models, Machine Translation
  • Must: fluent in basic machine learning concepts (both theory and hands-on)
  • Preferred: have built a small fine-tuned language model (i.e., LLaMA)
  • Preferred: interested in doing a PhD