Programmatically interpretable reinforcement learning
Programmatically interpretable reinforcement learning, Verma et al., ICML 2018
Being able to trust (interpret, verify) a controller learned through reinforcement learning (RL) is one of the key challenges for real-world deployments of RL that we looked at earlier this week. It’s also an essential requirement for agents in human-machine collaborations (i.e, all deployments at some level) as we saw last week. Since reading some of Cynthia Rudin’s work last year I’ve been fascinated with the notion of interpretable models. I believe there are a large set of use cases where an interpretable model should be the default choice. There are so many deployment benefits, even putting aside any ethical or safety concerns.
So how do you make an interpretable model? Today’s paper choice is the third paper we’ve looked at along these lines (following CORELS and RiskSlim), enough for a recognisable pattern to start to emerge. The first step is to define a language — grammar and associated semantics — in which the ultimate model to be deployed will be expressed. For CORRELS this consists of simple rule based expressions, and for RiskSlim it is scoring sheets. For Programmatically Interpretable Reinforcement Learning (PIRL) as we shall Continue reading

