Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned Models

Yuning Wang, He Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce VELM, a reinforcement learning (RL) framework grounded in verification principles for safe exploration in unknown environments. VELM ensures that an RL agent systematically explores its environment, adhering to safety properties throughout the learning process. VELM learns environment models as symbolic formulas and conducts formal reachability analysis over the learned models for safety verification. An online shielding layer is then constructed to confine the RL agent’s exploration solely within a state space verified as safe in the learned model, thereby bolstering the overall safety profile of the RL system. Our experimental results demonstrate the efficacy of VELM across diverse RL environments, highlighting its capacity to significantly reduce safety violations in comparison to existing safe learning techniques, all without compromising the RL agent’s reward performance.

Original languageAmerican English
Title of host publicationComputer Aided Verification - 36th International Conference, CAV 2024, Proceedings
EditorsArie Gurfinkel, Vijay Ganesh
PublisherSpringer Science and Business Media Deutschland GmbH
Pages232-255
Number of pages24
ISBN (Print)9783031656323
DOIs
StatePublished - 2024
Externally publishedYes
Event36th International Conference on Computer Aided Verification, CAV 2024 - Montreal, Canada
Duration: Jul 24 2024Jul 27 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14683 LNCS

Conference

Conference36th International Conference on Computer Aided Verification, CAV 2024
Country/TerritoryCanada
CityMontreal
Period7/24/247/27/24

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • Controller Synthesis
  • Reinforcement Learning
  • Safe Exploration
  • Safety Verification

Cite this