Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent

Ziheng Chen, Fabrizio Silvestri, Gabriele Tolomei, Jia Wang, He Zhu, Hongshik Ahn

Research output: Contribution to journalArticlepeer-review

Abstract

Counterfactual examples (CFs) are one of the most popular methods for attaching post hoc explanations to machine learning models. However, existing CF generation methods either exploit the internals of specific models or depend on each sample's neighborhood; thus, they are hard to generalize for complex models and inefficient for large datasets. This article aims to overcome these limitations and introduces ReLAX, a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task. We then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. In addition, we develop a distillation algorithm to extract decision rules from the DRL agent's policy in the form of a decision tree to make the process of generating CFs itself interpretable. Extensive experiments conducted on six tabular datasets have shown that ReLAX outperforms existing CF generation baselines, as it produces sparser counterfactuals, is more scalable to complex target models to explain, and generalizes to both the classification and regression tasks. Finally, we show the ability of our method to provide actionable recommendations and distill interpretable policy explanations in two practical real-world use cases.

Original languageAmerican English
Pages (from-to)1443-1457
Number of pages15
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number4
DOIs
StatePublished - Apr 1 2024
Externally publishedYes

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications

Keywords

  • Counterfactual explanations
  • deep reinforcement learning (DRL)
  • explainable artificial intelligence (XAI)
  • machine learning (ML) explainability

Fingerprint

Dive into the research topics of 'Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent'. Together they form a unique fingerprint.

Cite this