John Foley

h-index6

4papers

41citations

Novelty16%

AI Score19

Ranked #186,260 of 194,257 authors (top 96%)#39,433 in LG (top 98%)

4 Papers

6.0LGMay 7, 2019Code

Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning

Emma Tosch, Kaleigh Clary, John Foley et al.

Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing high quality environments for experimental evaluation of agent behavior. We present TOYBOX, a new high-performance, open-source* subset of Atari environments re-designed for the experimental evaluation of deep RL. We show that TOYBOX enables a wide range of experiments and analyses that are impossible in other environments. *https://kdl-umass.github.io/Toybox/

9.1LGApr 12, 2019Code

Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

Kaleigh Clary, Emma Tosch, John Foley et al.

Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variability in reinforcement learning agents that make reporting common summary statistics an unsound metric for performance. Our experiments demonstrate the variability of common agents used in the popular OpenAI Baselines repository. We make the case for reporting post-training agent performance as a distribution, rather than a point estimate.

11.4AIDec 6, 2018

ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents

John Foley, Emma Tosch, Kaleigh Clary et al.

It is a widely accepted principle that software without tests has bugs. Testing reinforcement learning agents is especially difficult because of the stochastic nature of both agents and environments, the complexity of state-of-the-art models, and the sequential nature of their predictions. Recently, the Arcade Learning Environment (ALE) has become one of the most widely used benchmark suites for deep learning research, and state-of-the-art Reinforcement Learning (RL) agents have been shown to routinely equal or exceed human performance on many ALE tasks. Since ALE is based on emulation of original Atari games, the environment does not provide semantically meaningful representations of internal game state. This means that ALE has limited utility as an environment for supporting testing or model introspection. We propose ToyBox, a collection of reimplementations of these games that solves this critical problem and enables robust testing of RL agents.

1.7IRMay 1, 2018

On the Equivalence of Generative and Discriminative Formulations of the Sequential Dependence Model

Laura Dietz, John Foley

The sequential dependence model (SDM) is a popular retrieval model which is based on the theory of probabilistic graphical models. While it was originally introduced by Metzler and Croft as a Markov Random Field (aka discriminative probabilistic model), in this paper we demonstrate that it is equivalent to a generative probabilistic model. To build an foundation for future retrieval models, this paper details the axiomatic underpinning of the SDM model as discriminative and generative probabilistic model. The only difference arises whether model parameters are estimated in log-space or Multinomial-space. We demonstrate that parameter-estimation with grid-tuning is negatively impacting the generative formulation, an effect that vanishes when parameters are estimated with coordinate-gradient descent. This is concerning, since empirical differences may be falsely attributed to improved models.