Primary supervisor
Teresa WangCo-supervisors
- Yuan-Fang Li
- Derry Wijaya
- Mohammed Eunus Ali
Research area
Vision and LanguageWhile large multimodal models (LMMs) have obtained strong performance on many multi-modal tasks, they may still hallucinate while generating text. Their performance on detecting salient features from visual data is also unclear. In this project, we develop a framework to generate faithful and salient text from mixed-modal data, which includes images and structured data.