FMitF: Track I: Synthesis and Verification for Programmatic Reinforcement Learning

Project Details

Description

Deep reinforcement learning (RL) has recently gained tremendous popularity in numerous decision-making systems. However, deploying RL techniques in safety-critical domains such as autonomous driving has drawn concerns about its trustworthiness. RL applications can behave unexpectedly because such systems are characterized by intractably large state spaces and only a small fraction of states are explored during training. Lack of trust in a deep neural RL policy is further exacerbated by the difficulty of identifying how logical notions of safety relate to the uninterpretable policy network structure. The project's novelty is to explore domain-specific programs as a new learning representation for trustworthy RL. It allows an RL system to be interpreted, formally verified, and debugged as an ordinary programming system. The project's impact is to shape new methodologies to design highly interpretable and verifiable RL systems that are capable of making reliable decisions. The main contribution of the project is a programmatic RL framework for synthesizing policies that meet both formal correctness specifications and quantitative performance objectives. Firstly, the project develops program synthesis techniques to compose primitive RL policies into composite and interpretable programs to solve sophisticated RL environments. To explore novel compositions, it applies efficient policy gradient methods to search in a continuous relaxation of the discrete space of language grammar rules. Secondly, the project reconciles programmatic policy synthesis with program verification in a correct-by-construction RL loop, constraining policy performance optimization in provably safe state space. Moreover, this project investigates the transferability of programmatic RL by lifting the synthesis and verification framework to first-order relational environments, leveraging relational abstractions to transfer programmatic policies to unseen environments with formal guarantees. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date10/1/219/30/25

Funding

  • National Science Foundation: $749,708.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.