LGAICVOct 16, 2017

Gradient-free Policy Architecture Search and Adaptation

arXiv:1710.05958v130 citations
Originality Incremental advance
AI Analysis

This work addresses safer autonomous driving policy learning in simulated environments, though it appears incremental as it builds on existing gradient-free and adaptation techniques.

The researchers tackled autonomous driving policy learning by developing a gradient-free method for architecture search and adaptation that learns from both demonstrations and environmental rewards, resulting in safer learning with a reduced cumulative crash metric in simulated driving.

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes