Skip to main content

Does deep learning over-fit - and, if so, how does it work?

Primary supervisor

David Dowe

Co-supervisors


Methods of balancing model complexity with goodness of fit include Akaike's information criterion (AIC), Schwarz's Bayesian information criterion (BIC), minimum description length (MDL) and minimum message length (MML) (Wallace and Boulton, 1968; Wallace and Freeman, 1987; Wallace and Dowe, 1999a; Wallace, 2005).

There are many cases in which over-fitting - with a model which is too complex - fails.  This is indeed what AIC, BIC, MDL and MML would anticipate.  And yet deep learning methods can often work despite this.  This project investigates how deep learning can survive over-fitting and whether methods dealing with model complexity - e.g., AIC, BIC, MDL, MML - can enhance deep learning.

 

References:

  D. L. Dowe (2008a), "Foreword re C. S. Wallace", Computer Journal, Vol. 51, No. 5 (Sept. 2008) [Christopher Stewart WALLACE (1933-2004) memorial special issue [and front cover and back cover]], pp523-560 (and here). www.doi.org: 10.1093/comjnl/bxm117

  D. L. Dowe (2011a), "MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness", Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, P.S. Bandyopadhyay and M.R. Forster (eds.), Elsevier, [ISBN: 978-0-444-51862-0 {ISBN 10: 0-444-51542-9 / ISBN 13: 978-0-444-51862-0}], pp901-982, 1/June/2011

  Wallace, C.S. (2005), ``Statistical and Inductive Inference by Minimum Message Length'', Springer  (Link to the preface [and p vi, also here])

  Wallace, C.S. and D.M. Boulton (1968), ``An information measure for classification'', Computer Journal, Vol 11, No 2, August 1968, pp 185-194

  Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal (special issue on Kolmogorov complexity), Vol. 42, No. 4, pp270-283

  Wallace, C.S. and P.R. Freeman (1987).  Estimation and inference by compact coding. J. Royal Statist. Soc. B, 49, 240–252.
 

Required knowledge

Important is a knowledge of at least one of mathematics, statistics and/or machine learning principles - including at least an interest in probability theory.  Candidates should also have a strong computer science background with good programming skills.


Learn more about minimum entry requirements.