Skip to main content

Evaluating Large Language Models in Automated Essay Scoring

Primary supervisor

Guanliang Chen

In education, writing is a prevalent pedagogical practice employed by teachers and instructors to enhance student learning. Yet, the timely evaluation of students' essays or responses represents a formidable challenge, consuming considerable time and cognitive effort for educators. Recognizing the need to alleviate this burden, Automatic Essay Scoring (AES) has emerged, which refers to the process of using machine learning techniques to evaluate and assign scores to student-authored essays or responses. By automating this assessment process, educators can better focus on refining their teaching strategies, ultimately enabling a more efficient and effective learning experience for students.

Student cohort

Double Semester


This project aims to apply cutting-edge techniques to enable Large Language Models (e.g., BERT, LLaMA, and GPT-4) to effectively perform the task of automated essay scoring in education. For this project, two Kaggle datasets can be used, including the one for short answer scoring and the one for essay scoring. Specific tasks include: (1) fine-tune Large Language Models to perform the task of automated essay scoring; (2) survey existing prompting techniques based on zero-shot learning and few-shot learning; (3) evaluate the effectiveness of the prompting techniques based on zero-shot learning and few-shot learning to empower Large Language Models to perform the tasks of automated essay scoring, and compared their effectiveness with the fine-tuning approach.

Required knowledge

  • Strong programming skills (e.g., Python)
  • Basic knowledge in Data Science, Natural Language Processing, Machine Learning, and Large Language Models
  • The following can be a plus: (i) good at academic writing; and (ii) strong motivation in pursing a quality academic publication.