Primary supervisorLizhen Qu
Research areaVision and Language
Synthetic data generation has drawn growing attention due to the lack of training data in many application domains. It is useful for privacy-concerned applications, e.g. digital health applications based on electronic medical records. It is also attractive for novel applications, e.g. multimodal applications in meta-verse, which have little data for training and evaluation. This project focuses on synthetic data generation for audio and the corresponding multimodal applications, such as mental health chatbots and digital assistants for negotiations.
For most such applications, the key technical challenge is to create disentangled representations for paralinguistic information and the content of speech. Herein, a component of such a disentangled representation contains only necessary information for the relevant attributes or properties. General composition mechanisms will be learned such that applications can combine appropriate components to generate desired data. For example, the script of an emotion support conversation can be well combined with the desired emotion and voice patterns for generating natural and empathetic speech. However, current deep generative models perform poorly for compositional generalization . The recent work shows that disentangled representations can be defined from a causal perspective . This is also relevant to causal representation learning, which aims for robustness and strong out-of-distribution generalization capability . If user-specific information is identified and removable from the input data, the devised techniques can also be applied for privacy-sensitive applications, such as privacy-preserving ASR.
 Wang, Yixin, and Michael I. Jordan. "Desiderata for representation learning: A causal perspective." arXiv preprint arXiv:2109.03795 (2021).
 Schölkopf, Bernhard, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. "Toward causal representation learning." Proceedings of the IEEE 109, no. 5 (2021): 612-634.
 Hupkes, Dieuwke, Verna Dankers, Mathijs Mul, and Elia Bruni. "Compositionality decomposed: how do neural networks generalise?." Journal of Artificial Intelligence Research 67 (2020): 757-795.