Equipped with the self-attention mechanism that has strong capability of capturing long-range dependencies, Transformer based models have achieved significant breakthroughs in many computer vision (CV) and natural language processing (NLP) tasks, such as machine translation, image classification and so on. However, the good performance of Transformers comes at a high computational cost. For example, a single Transformer model requires more than 10G Mult-Adds to translate a sentence of only 30 words. Such a huge computational complexity hinders the widespread adoption of Transformers, especially on resource-constrained devices, such as smart phones. In this project, we will explore efficient and scalable Transformers for NLP tasks, which is of great practical values.
To achieve a promising tradeoff between efficiency and accuracy for NLP/CV Transformers
basic math foundations, e,g., linear algebra, calculus, statistics and so on
NLP/CV background, e.g., machine translation
programming skills, e.g., Pytorch, Tensorflow
machine learning basic knowledge, e.g., SVM, kernel, regression and so on
deep learning basic knowledge, e.g., CNNs, Transformers