Skip to main content

Faithful and Salient Multimodal Data-to-Text Generation

Primary supervisor

Teresa Wang

Co-supervisors

Research area

Vision and Language

While large multimodal models (LMMs) have obtained strong performance on many multi-modal tasks, they may still hallucinate while generating text. Their performance on detecting salient features from visual data is also unclear. In this project, we develop a framework to generate faithful and salient text from mixed-modal data, which includes images and structured data.


Learn more about minimum entry requirements.