B. Ayton

h-index4

3papers

13citations

Novelty60%

AI Score28

Ranked #150,020 of 194,257 authors (top 77%)#9,057 in AI (top 72%)

3 Papers

4.5AISep 30, 2021Code

Is Policy Learning Overrated?: Width-Based Planning and Active Learning for Atari

Benjamin Ayton, Masataro Asai

Width-based planning has shown promising results on Atari 2600 games using pixel input, while using substantially fewer environment interactions than reinforcement learning. Recent width-based approaches have computed feature vectors for each screen using a hand designed feature set or a variational autoencoder trained on game screens (VAE-IW), and prune screens that do not have novel features during the search. We propose Olive (Online-VAE-IW), which updates the VAE features online using active learning to maximize the utility of screens observed during planning. Experimental results in 55 Atari games demonstrate that it outperforms Rollout-IW by 42-to-11 and VAE-IW by 32-to-20. Moreover, Olive outperforms existing work based on policy-learning ($π$-IW, DQN) trained with 100x training budget by 30-to-22 and 31-to-17, and a state of the art data-efficient reinforcement learning (EfficientZero) trained with the same training budget and ran with 1.8x planning budget by 18-to-7 in Atari 100k benchmark, with no policy learning at all. The source code is available at github.com/ibm/atari-active-learning .

2.3SYNov 25, 2018

RADMPC: A Fast Decentralized Approach for Chance-Constrained Multi-Vehicle Path-Planning

Aaron Huang, Benjamin J. Ayton, Brian C. Williams

Robust multi-vehicle path-planning is important for ensuring the safety of multi-vehicle systems in applications like transportation, search and rescue, and robotic exploration. Chance-constrained methods like Iterative Risk Allocation (IRA)\cite{IRA} have been developed for situations where environmental disturbances are unbounded. However, chance-constrained methods for the multi-vehicle case generally use centralized strategies where the vehicle set is planned with couplings between all vehicle pairs. This approach is intractable as fleet size increases because computation time is exponential with respect to the number of vehicles being planned over due to a polynomial increase in coupling constraints between vehicle pairs. We present a faster approach for chance-constrained multi-vehicle path-planning that relies upon a decentralized path-planning method called Risk-Aware Decentralized Model Predictive Control (RADMPC) to rapidly approximate a centralized IRA approach. The RADMPC approximation is evaluated for vehicle interactions to determine the vehicle sets that should be planned in a coupled manner. Applying IRA to the smaller vehicle sets determined from the RADMPC approximation rapidly plans safe paths for the entire fleet. A Monte Carlo simulation analysis demonstrates the correctness of our approach and a significant improvement in computation time compared to a centralized IRA approach.

3.1AISep 4, 2018

Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions

Benjamin J Ayton, Brian C Williams

Chance Constrained Markov Decision Processes maximize reward subject to a bounded probability of failure, and have been frequently applied for planning with potentially dangerous outcomes or unknown environments. Solution algorithms have required strong heuristics or have been limited to relatively small problems with up to millions of states, because the optimal action to take from a given state depends on the probability of failure in the rest of the policy, leading to a coupled problem that is difficult to solve. In this paper we examine a generalization of a CCMDP that trades off probability of failure against reward through a functional relationship. We derive a constraint that can be applied to each state history in a policy individually, and which guarantees that the chance constraint will be satisfied. The approach decouples states in the CCMDP, so that large problems can be solved efficiently. We then introduce Vulcan, which uses our constraint in order to apply Monte Carlo Tree Search to CCMDPs. Vulcan can be applied to problems where it is unfeasible to generate the entire state space, and policies must be returned in an anytime manner. We show that Vulcan and its variants run tens to hundreds of times faster than linear programming methods, and over ten times faster than heuristic based methods, all without the need for a heuristic, and returning solutions with a mean suboptimality on the order of a few percent. Finally, we use Vulcan to solve for a chance constrained policy in a CCMDP with over $10^{13}$ states in 3 minutes.