Skip to main content

Improving audiovisual sensing functions of a social robot

Primary supervisor

Leimin Tian


  • Dr Pamela Carreno (Engineering)

For a robot to communicate and interact with people, it requires the capability to understand audiovisual inputs, such as speech, gestures, and facial expressions, and to generate natural, timely, and expressive responses through these modalities. Current audiovisual sensing and generation functions of robots can benefit from leveraging advancement in deep learning based approaches reported in multimodal behavioural analysis research. In this project, your objective is to apply state of the art audiovisual sensing and generation models to improve the social communicative function of a Pepper robot.

Student cohort

Double Semester


Basic goals:

  1. Developing multimodal recognition models for human emotions and behaviors
  2. Evaluating performance of the multimodal recognition models on existing datasets of human-robot interaction
  3. Applying the multimodal recognition models to the Pepper robot

Possible extensions:

  1. Conducting human-robot interaction experiments to evaluate performance of the multimodal recognition models
  2. Developing behavior generation models based on features identified in the recognition models

Required knowledge

Python programming, deep learning