Primary supervisorJulian Gutierrez Santiago
In this project we will study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour. From a learning perspective these specifications provide a rich formal language with which to capture tasks or objectives, while from a logic and automated verification perspective the introduction of learning capabilities allows for practical applications in large, stochastic, unknown environments. The existing work in this area is, however, limited. Of the frameworks that consider full linear temporal logic or have correctness guarantees, all methods thus far consider only the case of a single temporal logic specification and a single agent. In order to overcome this limitation, we have developed the first multi-agent reinforcement learning technique for temporal logic specifications, which is also novel in its ability to handle multiple specifications. We provided correctness and convergence guarantees for the main algorithm - ALMANAC (Automaton/Logic Multi-Agent Natural Actor-Critic) - even when using function approximation, such as Neural Networks, under the assumption that agents can cooperate in the learning process. The aim of this project is to evaluate ALMANAC in practice in settings where adversarial, rather than cooperative, behaviour is intended. In particular, we want to evaluate ALMANAC in cases where adversarial behaviour is represented using the temporal logic specifications of the agents in the learning system.