Skip to main content

NLP analysis of official discourse in contemporary China

Primary supervisor

Yuan-Fang Li

Co-supervisors

  • Prof. Robert Thomson (Monash)
  • Assoc. Prof. Delia Lin (University of Melbourne)

This multidisciplinary project combines cutting-edge Natural Language Processing (NLP), Chinese Studies and Political Science. The project aims to develop a deeper understanding of how official discourse has developed throughout the history of the People’s Republic of China. The main focus will be on text in the People’s Daily, the largest newspaper in China and the official newspaper of the Chinese Communist Party. The People’s Daily plays a major role in the Chinese government’s communication with the public, and it is therefore a key resource for understanding the process of governance.

The project draws on insights from previous qualitative Social Science research in Chinese Studies and Political Science, by researchers who understand Chinese context. Previous research in Chinese Studies has examined how key concepts with deep cultural resonance form part of the government’s discourse. For instance, the concept of suzhi (素质), which is approximately translated as ‘quality’ citizenship, has been used repeatedly throughout the recent history of Chinese governance in different contexts and policy areas, and with subtly different and changing meanings. Qualitative research has revealed insights into these changing meanings. NLP methods that build on these insights have the potential to reveal even more patterns and trends.

    The student will be part of a supportive multidisciplinary team that includes established and emerging researchers in the fields of NLP, Chinese Studies and Political Science. The main supervisors will be Dr. Yuan-Fang Li (Data Science, Monash) and Professor Robert Thomson (Political Science, Monash). Other members of the team will be Associate Professor Delia Lin (Chinese Studies, University of Melbourne), Ms. Yang Wang (Chinese Studies, University of Melbourne), and Mr. Xinwei Chen (Political Science, Monash).

    The student’s project will support the work of this multidisciplinary research team, by contributing to their ongoing work that addresses key research questions on official discourse on China.

      Student cohort

      Double Semester

      Aim/outline

      Developing and implementing this project involves a series of well-defined but challenging tasks, including:

      • Build a comprehensive corpus of the original Chinese-language text of the People’s Daily from 1947 to present.
      • Train a model to classify articles (and possibly sentences within articles) into broad policy issues (Macroeconomics, Domestic Commerce, Environment, Social Welfare, etc.). A framework for classifying broad policy issues has already been developed.
      • Extract articles and sentences that mention key concepts to be specified during the course of the project.

      Required knowledge

      The student will need to have experiences with and knowledge on the following areas:

      • Data science, natural language processing, deep learning
      • Python programming, especially familiarity with data science & data processing (e.g. PyTorch, TensorFlow, spaCy)
      • Knowledge of China, fluency in Mandarin, and an interest in Chinese governance and society.