Skip to main content

[Bioinformatics Project] Machine learning based annotation on phage genomes

Primary supervisor



  • Prof. Trevor Lithgow
  • Dr. Jiawei Wang

Antimicrobial resistance (AMR) continues to evolve as a major threat to human health and new strategies are required for the treatment of AMR infections. Bacteriophages (phages) that kill bacterial pathogens are being identified for use in phage therapies, with the intention to apply these bactericidal viruses directly into the infection sites in bespoke phage cocktails. Using such a biological agent for infection control requires deep understanding of the phage. Thus, and despite the great unsampled phage diversity for this purpose, an critical issue hampering the roll out of phage therapy is the poor-quality annotation of many of the phage genomes.
To this end, our lab has developed machine learning based toolkits to accurately predict different types of phage proteins, including anti-CRISPR proteins using the PaCRISPR web server (, published at Nucleic Acids Research) and phage structural proteins using our STEP3 predictor (, under revision at mSystems). We also developed a comprehensive platform AcrHub (, published at Nucleic Acids Research) to provide all-in-one service for comprehensively categorizing, predicting and visualizing the anti-CRISPR proteins of phages.
This project has three aims to further our current work: 1) expanding STEP3 to predict the structural proteins into subtypes, such as those in the Capsid, Neck and Tail parts of phage particles; 2) expanding PaCRISPR to divide the non-structural proteins into different subtypes, such as anti-CRISPRs and various types of phage enzymes; and 3) building a platform to make annotations for phage genomes through combining work from 1) and 2).

This project can be taken as a single or double semester research. 

  • Single semester (Aim 1)
  • Double semester 2 (Aims 1, 2 and 3) 

For more information, contact the primary supervisor Prof. Trevor Lithgow <>

Student cohort

Single Semester
Double Semester

Required knowledge

  • Machine learning (in R or Python)
  • Programming languages (in JAVA, Python or R)