Primary supervisor
Trang VuLarge language models (LLMs) have recently made significant progress in machine translation quality [1], but they still struggle with maintaining consistency and accuracy across entire documents. Professional translators commonly use translation memory (TM) tools to reuse past translations, ensuring consistent terminology and phrasing throughout a document. Inspired by the latest research, such as DelTA [2], a document-level translation agent with a multi-level memory architecture, and HiMATE [3], a multi-agent evaluation framework leveraging fine-grained MQM error typology, this project seeks to bridge the gap between LLMs and traditional TM systems. The goal is to enhance domain-specific accuracy and document-level coherence in LLM-based translation by intelligently incorporating a translation memory mechanism.
Student cohort
Aim/outline
- An open source code
- A publication in NLP venues such as ACL/EMNLP/NAACL/EACL
URLs/references
[1] Wu et al. 2024. (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts TACL
[2] Wang et al. 2025. DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory. ICLR 2025
[3] Zhang et al. 2025. HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation. arXiv:2505.16281
Required knowledge
- Must: fluency in Python and PyTorch
- Must: academic or working knowledge of Large Language Models, Machine Translation
- Must: fluent in basic machine learning concepts (both theory and hands-on)
- Preferred: have built a small fine-tuned language model (i.e., LLaMA)
- Preferred: interested in doing a PhD