Primary supervisorJulian Gutierrez Santiago
Reinforcement Learning (RL) systems can be represented as Markov Decision Processes (MDPs), which are graph-based models of probabilistic behaviour. Typically, a logic over MDPs predicates only about temporal or epistemic properties of such systems, but fails to express properties about the learning behaviour that such systems may represent. In this project, the aim is to investigate extensions of temporal and epistemic logics to be able to express learning properties of RL systems consisting of multiple agents. In particular, the goal of this project is to define the syntax and semantics of such a logic and to develop optimal model checking decision procedures as well as an associated proof-of-concept tool implementation to evaluate the new logic-based reasoning framework.
The student should have strong mathematical skills, and ideally some knowledge of computational complexity, automata theory, and basics of logic and proof.