Skip to main content

Robust Learning and Planning with Neural Network Transition Models

Primary supervisor

Buser Say

Planning is the reasoning side of acting in Artificial Intelligence. Planning automates the selection and the organization of actions to reach desired states of the world as best as possible. For many real-world planning problems however, it is difficult to obtain a transition model that governs state evolution with complex dynamics. Fortunately as visualized in Figure 1, recent works [1,2,3,4,5,6] have shown that the unknown transition models can be accurately approximated as (deep) neural networks which then can be compiled into mathematical optimization models (e.g., MILP, Weighted Partial MaxSAT, pseudo-Boolean optimization etc.) over a fixed horizon, and solved optimally using off-the-shelf solvers.

One important limitation of this learning and planning framework is the optimizer’s curse, which is the suboptimal decision-making that is a result of optimal planning with respect to an incorrectly learned (neural network) model. In this project, the following two fundamental research questions that are in the core of overcoming the optimizer’s curse will be studied. Namely:

1. How can we robustly plan with respect to the predictions errors (i.e., interpolation and extrapolation prediction errors) of the learned (neural network) models?

2. How can the (neural network) models be trained that are robust to the optimizer’s curse?

 

Figure 1: Visualization of the learning and planning framework presented in [5] where red circles represent action variables, blue circles represent state variables, gray circles represent the activation units and w's represent the weights of the neural network.
Figure 1: Visualization of the learning and planning framework presented in [1] where red circles represent action variables, blue circles represent state variables, gray circles represent the activation units and w's represent the weights of the neural network.

 

References:

[1] Buser Say, Ga Wu, Yu Qing Zhou and Scott Sanner. Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In 26th IJCAI, pages 750–756, 2017.

[2] Ga Wu, Buser Say and Scott Sanner. Scalable planning with Tensorflow for hybrid nonlinear domains. In 31st NeurIPS, pages 6273-6283, 2017.

[3] Ga Wu, Buser Say and Scott Sanner. Scalable planning with deep neural network learned transition models. In JAIR, Volume 68, pages 571-606, 2020.

[4] Buser Say and Scott Sanner. Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models. In 27th IJCAI, pages 4815-4821, 2018.

[5] Buser Say and Scott Sanner. Compact and Efficient Encodings for Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models. In AIJ, Volume 285, Article 103291, 2020.

[6] Buser Say, Jo Devriendt, Jakob Nordström and Peter Stuckey. Theoretical and Experimental Results for Planning with Learned Binarized Neural Network Transition Models. In CP 2020, pages 917-934, 2020.

Required knowledge

A successful candidate should have strong programming skills (e.g., in Python) as well as background in at least one of the 
following:

  • (deep) neural networks, and/or
  • automated planning, and/or
  • mathematical optimization.

Project funding

Other

Learn more about minimum entry requirements.