Skip to main content

Multimodal Output Generation to Assist Blind People for Data Exploration and Analysis

Primary supervisor

Lizhen Qu

Research area

Vision and Language

In the big-data era, the proliferation of data and the widespread adoption of data analytics have made data literacy a requisite skill for all professions, not just specialist data scientists. At the core of data literacy is the ability to detect patterns and trends, or identify outliers and anomalies from data. However, these requirements often rely on visualisations, which creates a severe accessibility issue for blind people. As part of a project funded by the Australian Research Council, this PhD project aims to devise novel techniques to build a multimodal natural language (NL) generator – an essential component of a conversational agent that supports blind people in achieving data literacy.

In this project, candidates will use a new generation dynamic tactile display as a modality in addition to audio. Candidates will be expected to devise novel multi-modal generation models by incorporating ideas and techniques from various techniques, such as causality, deep learning, deep reinforcement learning, and traditional multi-modal and NL generation. As there is not a rich collection of dialogues for both training and evaluation, the candidate is expected to devise techniques in low resource settings. Candidates will also be expected to engage in a participatory research approach, involving blind and low vision end users as well as sector professionals


Cheng, W., Luo, Z., and Yin, Q. Adaptive Prior-Dependent Correction Enhanced Reinforcement Learning for Natural Language Generation. Proceedings of the AAAI Conference on Artificial Intelligence35(14), 2021

Seitzer, Maximilian, Bernhard Schölkopf, and Georg Martius. "Causal Influence Detection for Improving Efficiency in Reinforcement Learning." in NeurIPS, 2021.

Zhang, Shijie, Lizhen Qu, Shaodi You, Zhenglu Yang, and Jiawan Zhang. "Automatic generation of grounded visual questions." in IJCAI, pages 4235--4243, 2017.

E. Andre and C. Pelachaud. Interacting with embodied conversational agents. In F. Chen and K. Jokinen, editors, Speech Technology: Theory and Applications, pages 123–149. Springer, 2010.

O. Biran and K. McKeown. Human-Centric Justification of ML Predictions. In IJCAI2017, pages 1461–1467, 2017.

L. Cavazos Quero et al.˙ Jido: A Conversational Tactile Map for Blind People. In SIGACCESS2019, pages 682–684, 2019.

E. Fast et al.˙ Iris: A Conversational Agent for Complex Tasks. In CHI2018, pages 1–12. ACM, 2018.

Y. Yang, K. Marriott, M. Butler, C. Goncu, and L. Holloway. Tactile presentation of network data: Text, matrix or diagram? In CHI2020, pages 1–12, 2020.

I. Zukerman et al.˙Exploratory Interaction with a Bayesian Argumentation System. In IJCAI1999, pages 1294–1299, 1999.

Required knowledge

Candidates are expected to have a solid background in machine learning and Language Technology. Preference will be given to candidates who have strong written and oral communication skills, as well as strong programming skills. It is desirable that the candidates already have research experience in at least one of the following areas: deep learning, deep reinforcement learning, causality, and natural language generation.

Project funding


Learn more about minimum entry requirements.