Skip to main content

Primary supervisor

Cagatay Goncu

Co-supervisors


ad

It is quite challenging to access to videos for people who are blind or have low vision (BLV), particularly creating audio descriptions that describe the scenes without interfering the dialogues in a video. There is also the challenge of providing additional information using multi-modal feedback, that is using non-speech audio and haptics.

In this project, you will work on analysing videos using deep learning frameworks and extract information to generate a multi-modal interaction that includes audio descriptions, non-speech audio, and haptics. The input videos will include online lectures, movie clips, and media artworks and the output feedback will be audio descriptions that describes the background and foreground objects in the frames as well as the mood of people in these frames.