Jordan T. Bishop

2.0LGMay 17, 2023Code

Pittsburgh Learning Classifier Systems for Explainable Reinforcement Learning: Comparing with XCS

Jordan T. Bishop, Marcus Gallagher, Will N. Browne

Interest in reinforcement learning (RL) has recently surged due to the application of deep learning techniques, but these connectionist approaches are opaque compared with symbolic systems. Learning Classifier Systems (LCSs) are evolutionary machine learning systems that can be categorised as eXplainable AI (XAI) due to their rule-based nature. Michigan LCSs are commonly used in RL domains as the alternative Pittsburgh systems (e.g. SAMUEL) suffer from complex algorithmic design and high computational requirements; however they can produce more compact/interpretable solutions than Michigan systems. We aim to develop two novel Pittsburgh LCSs to address RL domains: PPL-DL and PPL-ST. The former acts as a "zeroth-level" system, and the latter revisits SAMUEL's core Monte Carlo learning mechanism for estimating rule strength. We compare our two Pittsburgh systems to the Michigan system XCS across deterministic and stochastic FrozenLake environments. Results show that PPL-ST performs on-par or better than PPL-DL and outperforms XCS in the presence of high levels of environmental uncertainty. Rulesets evolved by PPL-ST can achieve higher performance than those evolved by XCS, but in a more parsimonious and therefore more interpretable fashion, albeit with higher computational cost. This indicates that PPL-ST is an LCS well-suited to producing explainable policies in RL domains.

5.3LGMay 17, 2023Code

A Genetic Fuzzy System for Interpretable and Parsimonious Reinforcement Learning Policies

Jordan T. Bishop, Marcus Gallagher, Will N. Browne

Reinforcement learning (RL) is experiencing a resurgence in research interest, where Learning Classifier Systems (LCSs) have been applied for many years. However, traditional Michigan approaches tend to evolve large rule bases that are difficult to interpret or scale to domains beyond standard mazes. A Pittsburgh Genetic Fuzzy System (dubbed Fuzzy MoCoCo) is proposed that utilises both multiobjective and cooperative coevolutionary mechanisms to evolve fuzzy rule-based policies for RL environments. Multiobjectivity in the system is concerned with policy performance vs. complexity. The continuous state RL environment Mountain Car is used as a testing bed for the proposed system. Results show the system is able to effectively explore the trade-off between policy performance and complexity, and learn interpretable, high-performing policies that use as few rules as possible.

Jordan T. Bishop

2 Papers