Skip to main content

Characterising Model Complexity for Data-driven Scientific Discovery


This project aims to explore techniques for characterising the complexity of statistical models. By complexity we refer to the ability of a model to learn patterns, and to potentially generalise to new unseen data. Interest in this are has recently resurged due to the discovery of phenomena such as "double descent", and the use of new model types such as deep neural networks, which challenge traditional notions of complexity. The primary approaches used in this project would be via the Bayesian framework for statistical learning as well as utilising information theoretic techniques such as minimum message length/minimum description length. These provide tools that would allow us to accurately build measures of how complex a statistical model is, and to optimise this complexity relative to the available data. Models of interest include generalized additive models, sets of rules and deep neural networks. An application to the data-driven discovery of rare scientific phenomena in the context of either materials science, or brain science, with the goal of discovering novel CO2 catalysts, or descriptions of states of consciousness, is anticipated.

• Develop new techniques for assessing model complexity based on Bayesian learning or information theoretic approaches

• Use these techniques to fit models, assess predictive performance and optimise model hyperparameters

• Apply these techniques to discovery of rare scientific phenomena, in the areas of materials or brain science

Areas involved:
Machine Learning, Statistics, Materials Science/Brain Science

Required knowledge

• Statistical modelling

• Machine learning algorithms

• Python (SciPy stack)

• Matlab

Project funding

Project based scholarship

Learn more about minimum entry requirements.