Provably Safe RL

image-center

We have to guarantee the safety of AI agents in deployment as they can show dangerous behavior in unseen situations. As reinforcement learning (RL) agents learn by exploring the world, they are especially susceptible to returning unsafe actions. In this line of work, we investigate how to best incorporate formal safety guarantees in the training of RL agents.

We catogerize provably safe RL by the way the actions are altered, namely action replacement, action projection, and action masking. Throughout all tasks, incorporating provable safety guarantees in the RL training significantly improves the performance of the agent. We find that action replacement is an easy to apply and effective method for simple environments. In more complex environments, our new continuous action masking approach performs the best.

image-center

Publications:

NeurIPS 2024: Excluding the irrelevant: Focusing reinforcement learning through continuous action masking [OpenReview]
TMLR 2023: Provably safe reinforcement learning: Conceptual analysis, survey, and benchmarking [OpenReview]

Jakob Thumm