Dr. Andreas Aristidou

Collaborative Museum Heist with Reinforcement Learning

Eleni Evripidou, Andreas Aristidou, Panayiotis Charalambous

Computer Animation and Virtual Worlds, Volume 34, Issue 3-4, May 2023.

Presented at the 36th International Conference on Computer Animation and Social Agents, CASA'23, May, 2023.

In this paper, we present our initial findings of applying Reinforcement Learning techniques to a museum heist game, where trained robbers with different skills learn to cooperate and maximize individual and team rewards while avoiding detection by scripted security guards and cameras, showcasing the feasibility of training both sides concurrently in an adversarial game setting.

[DOI] [paper] [bibtex]

Abstract

Non-playable characters (NPCs) play a crucial role in enhancing immersion in video games. However, traditional NPC behaviors are often hard-coded using methods such as Finite State Machines, Decision and Behavior trees. This has a few limitations; namely, it is quite difficult to implement complex cooperative behaviors and secondly this makes it easy for human players to identify and exploit patterns in behavior. To overcome these challenges, Reinforcement learning (RL) can be used to generate dynamic and real-time NPC responses to human player actions. In this paper, we report on first results of applying RL techniques to a Non-Zero Sum, adversarial asymmetric game, using a multi-agent team. The game environment simulates a museum heist, where the objective of the successfully trained team of robbers with different skills (Locksmith, Technician) is to steal valuable items from the museum without being detected by the scripted security guards and cameras. Both agents were trained concurrently with separate policies and received both individual and group reward signals. Through this training process, the agents learned to cooperate effectively and use their skills to maximize both individual and team benefits. These results demonstrate the feasibility of realizing the full game where both robbers and security guards are trained at the same time to achieve their adversarial goals.

Ray Sensors. Starting from top left: back (pink), front (red), security camera (blue), robbers & guards (green), and valuables(yellow).

Acknowlegments

This project has received funding from the European Union's Horizon 2020 Research and Innovation Programme under Grant Agreement 739578 and the Government of the Republic of Cyprus through the Deputy Ministry of Research, Innovation and Digital Policy; and internal funds from the University of Cyprus (project: Demonstration).