Extreme Multi-label Text Classification with Metadata and Pretrained knowledge

Primary supervisor

Ethan Zhao

Co-supervisors

In multi-label classifications, a data sample is associated with more than one active label, which is a more challenging task than conventional single-label classifications. This project will focus on eXtreme Multi-Label (XML) classifications for text data (i.e., documents), where the label set can be extremely large, e.g., more than 10,000. For example, the input texts can be the item descriptions of an e-commerce website (e.g., Amazon) and one needs to classify them into a large set of item categories. The project is to develop novel machine learning and deep learning models for XML of text data by leveraging metadata of documents and knowledge in pretrained language models.

Aim/outline

We aim to propose a new method for XML of text data. The primary goal is to publish the proposed methods in top machine learning, data mining, and natural language processing venues (i.e., CORE ranking A* or A conferences). The second goal is to develop a demo and research code package along with the publications. This project is particularly suitable for students who aim to pursue their research degrees in machine learning and deep learning. The planned publications are expected to put weight on their future PhD applications.

URLs/references

https://arxiv.org/pdf/1905.02331.pdf

http://manikvarma.org/downloads/XC/XMLRepository.html

Suprvisors' website:

https://ethanhezhao.github.io

http://dinhphung.ml/?i=1

Required knowledge

Proficiency in Python especially Tensorflow and/or PyTorch
Foundations of machine learning and deep learning (e.g., FIT3181 or FIT5215)
Basic knowledge in probabilities and statistics
Prior familiarity in pretrained language models (e.g., BERT) is preferred but not required

Primary supervisor

Co-supervisors

Aim/outline

URLs/references

Required knowledge

Honours projects

Supervisor Connect

Browse

Recently added