Skip to main content

Efficient Transformers for Computer Vision and Natural Language Processing

Primary supervisor

Bohan Zhuang


Equipped with the self-attention mechanism that has strong capability of capturing long-range dependencies, Transformer based models have achieved significant breakthroughs in many computer vision (CV) and natural language processing (NLP) tasks, such as machine translation, image classification and so on. However, the good performance of Transformers comes at a high computational cost. For example, a single Transformer model requires more than 10G Mult-Adds to translate a sentence of only 30 words. Such a huge computational complexity hinders the widespread adoption of Transformers, especially on resource-constrained devices, such as smart phones. In this project, we will explore efficient and scalable Transformers for NLP tasks, which is of great practical values.

Student cohort

Single Semester
Double Semester


To achieve a promising tradeoff between efficiency and accuracy for NLP/CV Transformers

Required knowledge


  • basic math foundations, e,g., linear algebra, calculus, statistics and so on

  • NLP/CV background, e.g., machine translation

  • programming skills, e.g., Pytorch, Tensorflow

  • machine learning basic knowledge, e.g., SVM, kernel, regression and so on

  • deep learning basic knowledge, e.g., CNNs, Transformers