Skip to main content

Exploring HLA polymorphism through data analytics

Primary supervisor

Sri Ramarathinam

Co-supervisors


Human Leukocyte Antigen (HLA) molecules are critical for immune surveillance and organ transplantation. There are
over 22,000 HLA class I proteins and 8000 HLA class II proteins reported to date making this one of the most
polymorphic regions of the human genome. HLA molecules play an important role in immune recognition of self from foreign in the human body. Every individual expresses a set of class I (HLA-A, B, C) and
class II (HLA-DR, -DQ, -DP) HLA. Currently, the HLA typing in the clinic is done by DNA sequencing; but the immune
reaction depends on recognising its protein counterpart. The project aims at exploring the potential of proteomics
techniques in HLA typing. Proteomics using mass spectrometry identifies tens of thousands of short protein
sequences (peptides) which are used to infer the protein molecule. This tabular data is often difficult to interpret
without the use of data analytics and automated reporting techniques. Understanding protein sequence similarities
using string matching algorithms is also vital for this research. The project will utilise computational algorithms, string alignment tools and statistical methods on complex proteomics data to assemble short peptide sequences into consensus protein sequences. The latter part will involve visualisation of HLA protein sequence data, reducing
redundancy, mapping relationships between similar molecules and identification of proteotypic (unique)
sequences for each of the key HLA molecules. The student will be required to use programming languages
such as R and/or Python to interrogate, manipulate, analyse and visualize genomics and proteomic datasets. They will get hands-on experience with real-world biomedical data, and learn critical data science skills including programming, data manipulation, statistical analysis and reporting.

Student cohort

Double Semester

Aim/outline

To develop tools to visualise and capture the diversity of HLA molecules to facilitate HLA typing from mass spectrometry data.

Required knowledge

Programming skills (R and/or Python)