Jeffrey Tsang

LGJun 13, 2017

Hybrid Reward Architecture for Reinforcement Learning

Harm van Seijen, Mehdi Fatemi, Joshua Romoff et al.

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.

GTJan 29, 2014

The parametrized probabilistic finite-state transducer probe game player fingerprint model

Jeffrey Tsang

Fingerprinting operators generate functional signatures of game players and are useful for their automated analysis independent of representation or encoding. The theory for a fingerprinting operator which returns the length-weighted probability of a given move pair occurring from playing the investigated agent against a general parametrized probabilistic finite-state transducer (PFT) is developed, applicable to arbitrary iterated games. Results for the distinguishing power of the 1-state opponent model, uniform approximability of fingerprints of arbitrary players, analyticity and Lipschitz continuity of fingerprints for logically possible players, and equicontinuity of the fingerprints of bounded-state probabilistic transducers are derived. Algorithms for the efficient computation of special instances are given; the shortcomings of a previous model, strictly generalized here from a simple projection of the new model, are explained in terms of regularity condition violations, and the extra power and functional niceness of the new fingerprints demonstrated. The 2-state deterministic finite-state transducers (DFTs) are fingerprinted and pairwise distances computed; using this the structure of DFTs in strategy space is elucidated.

Jeffrey Tsang

2 Papers