Skip to main content

Primary supervisor

Enes Makalic

Co-supervisors

  • Lisa Ellis

Background and Motivation

Australian healthcare generates substantial volumes of unstructured clinical documentation including referral letters, discharge summaries, specialist correspondence, pathology reports, and medication lists. All vary significantly in format, terminology, and quality across providers and institutions. Large language models (LLMs) have demonstrated promising capability in extracting structured information from unstructured text, yet their performance on Australian medical documents specifically remains poorly characterised. As AI-assisted health information tools increasingly enter consumer and patient-facing applications, understanding where these models succeed and fail becomes a patient safety question, not merely a technical one. This project addresses a significant gap in the Australian health AI literature with findings applicable across digital health, clinical decision support, and health information management.

Aim/outline

Project Description

This project will systematically evaluate the accuracy and failure modes of current large language models when parsing unstructured Australian medical documents. The student will develop a benchmarking framework and annotated document corpus, evaluate model performance across document types and clinical specialties, and classify errors by frequency and potential for downstream clinical harm.

Key research questions include: How accurately do LLMs extract structured clinical information from Australian medical documents (including medications, conditions, dosages, dates, and provider details)? How does accuracy vary across document types, clinical specialties, and document quality including handwritten and scanned formats? What prompt engineering approaches or output validation strategies most effectively reduce clinically significant errors? How do models perform on Australian-specific medical terminology and drug names compared to international benchmarks?

The project sits at the intersection of natural language processing, health informatics, and AI safety, and offers opportunities to extend into areas including human-AI interaction in health contexts, error taxonomy development, and evaluation framework design. The student will work with an industry partner with clinical domain expertise, providing access to realistic document types and real-world validation of findings.

URLs/references

  1. Speech and Language Processing, Jurafsky, D., & Martin, J. H. Speech and Language Processing (3rd ed. draft). 
  2. Deep Learning, Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  3. Brown, T. et al. (2020). “Language Models are Few-Shot Learners.” Advances in Neural Information Processing Systems (NeurIPS).

Required knowledge

Students undertaking this project should ideally have knowledge or experience in several of the following areas:

  • Programming experience in Python, including data processing and scripting
  • Familiarity with machine learning or natural language processing concepts
  • Understanding of large language models and prompt engineering fundamentals
  • Experience working with structured and unstructured textual data
  • Basic statistical analysis and evaluation methodology
  • Familiarity with version control systems such as Git
  • Interest in healthcare AI, digital health, or health informatics
  • Ability to critically analyse model outputs and error patterns
  • Experience with libraries or frameworks such as Hugging Face Transformers, PyTorch, LangChain, spaCy, or similar is desirable but not essential
  • Strong written communication and research skills

The project is suitable for students with backgrounds in computer science, data science, artificial intelligence, software engineering, biomedical engineering, or related disciplines.