Skip to main content

Primary supervisor

Thanh Thi Nguyen

This project aims to develop techniques that enable users to find relevant audio content by inputting textual queries. This process leverages machine learning models, particularly natural language processing and audio signal processing, to bridge the gap between text and audio. When a user submits a query, the system analyses the text to understand its intent and context. It then searches a database of audio files, employing techniques such as keyword extraction, semantic understanding, and even speech recognition, to match the query with relevant audio clips. Recent state-of-the-art deep learning methods will be thoroughly reviewed to identify their strengths and weaknesses. Additionally, these methods will undergo empirical evaluation to assess their performance in practical applications, providing insights into their effectiveness and potential improvements. These approaches enhance the efficiency of locating specific sounds, speeches, or music within large collections, making it especially useful in many applications.

Student cohort

Single Semester
Double Semester

Required knowledge

Python programming

Machine learning background