Primary supervisor
Xiaoning DuDespite the rapid progress made recently, deep learning (DL) approaches are data-hungry. To achieve their optimum performance, a significantly large amount of labeled data is required. Very often, unlabelled data is abundant but acquiring their labels is costly and difficult. Many domains require a specialist to annotate the data samples, for instance, the medical domain. Data dependency has become one of the limiting factors to applying deep learning in many real-world scenarios. As reported, it costs more than 49,000 workers from 167 countries for about 9 years to label the data in ImageNet, one of the largest visual recognition datasets containing millions of images in more than 20,000 categories. To make the training and evaluation process of DL applications more efficient, there is an increasing need to make the most out of limited available resources and select the most valuable inputs for manual annotation. This project aims to addresses the problem of prioritizing error-revealing samples from a large set of unlabelled data for various DL tasks.
Student cohort
Required knowledge
Natural language processing, software testing