Aim/outline
Graphs or networks are effective tools to representing a variety of data in different domains. In the biological domain, chemical compounds can be represented as networks, with atoms as nodes and chemical bonds as edges. Analysis these networks are important as they may provide AI-based approaches for drug discovery. This project will focus on representing and inferring chemical or biological networks as a form of relational and structural learning. Given a network dataset, we wish to infer a model of the distribution of the elements of this data-set, possibly as a mixture of several distributions. We wish to represent the biological networks into proper formats, e.g., vector representations, so that existing machine learning algorithms (e.g., support vector machines) can readily be used for the prediction task, such as predicting the bioassay of a given chemical network. One of the approaches that will be considered will be the Bayesian information-theoretic Minimum Message Length (MML) principle.
Expected outcomes: The student will learn inference and representation learning methods for network data. The knowledge can be easily used to analyse other networks, including but not limited to social networks, citation networks, and communication networks. A research publication in a refereed AI conference or journal is expected. A student taking this project should ideally have at least a reasonable background mathematical knowledge, including differential calculus (e.g., partial derivatives) and matrix determinants. The student should also know how to program with either Matlab, Java, or Python. Ideally, the student understands the process of data analysis, which includes data pre-processing, algorithms selection, and evaluation.
References
Comley, Joshua W. and D.L. Dowe (2003). General Bayesian Networks and Asymmetric Languages, Proc. 2nd Hawaii International Conference on Statistics and Related Fields, 5-8 June, 2003
Comley, Joshua W. and D.L. Dowe (2005). ``Minimum Message Length and Generalized Bayesian Nets with Asymmetric Languages'', Chapter 11 (pp265-294) in P. Gru:nwald, I. J. Myung and M. A. Pitt (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press (MIT Press), April 2005, ISBN 0-262-07262-9. [Final camera ready copy was submitted in October 2003.]
David L. Dowe and Nayyar A. Zaidi (2010), "Database Normalization as a By-product of Minimum Message Length Inference", Proc. 23rd Australian Joint Conference on Artificial Intelligence (AI'2010) [Springer Lecture Notes in Artificial Intelligence (LNAI), vol. 6464], Adelaide, Australia, 7-10 December 2010, Springer, pp82-91
Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, Chengqi Zhang: Adversarially Regularized Graph Autoencoder for Graph Embedding. IJCAI 2018: 2609-2615
Shirui Pan, Jia Wu, Xingquan Zhu, Guodong Long, Chengqi Zhang: Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognition 48(11): 3783-3796 (2015)
Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, Yang Wang: Tri-Party Deep Network Representation. IJCAI 2016: 1895-1901
G. Visser, P. E. R. Dale, D. L. Dowe, E. Ndoen, M. B. Dale and N. Sipe (2012), "A novel approach for modeling malaria incidence using complex categorical household data: The minimum message length (MML) method applied to Indonesian data", Computational Ecology and Software, 2012, 2(3):140-159
Wallace, C.S. (2005), ``Statistical and Inductive Inference by Minimum Message Length'', Springer (Link to the preface [and p vi, also here])
Wallace, C.S. and D.L. Dowe (1994b), Intrinsic classification by MML - the Snob program. Proc. 7th Australian Joint Conf. on Artificial Intelligence, UNE, Armidale, Australia, November 1994, pp37-44
Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal (special issue on Kolmogorov complexity), Vol. 42, No. 4, pp270-283
Wallace, C.S. and D.L. Dowe (2000). MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions, Statistics and Computing, Vol. 10, No. 1, Jan. 2000, pp73-83
Required knowledge
The student should ideally have at least a reasonable background mathematical knowledge, including differential calculus (e.g., partial derivatives) and matrix determinants. The student should also know how to program with either Matlab, Java, or Python. Ideally, the student understands the process of data analysis, which includes data pre-processing, algorithms selection, and evaluation.