AI NEMay 4, 2012

Robot Navigation using Reinforcement Learning and Slow Feature Analysis

arXiv:1205.0986v12 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of state representation for reinforcement learning in robotics, but it is incremental as it applies an existing unsupervised method to a specific domain.

The paper tackled the problem of filtering raw sensor data for reinforcement learning in robot navigation by proposing slow feature analysis (SFA) to learn filters from video inputs, resulting in successful navigation in about 80% of test trials.

The application of reinforcement learning algorithms onto real life problems always bears the challenge of filtering the environmental state out of raw sensor readings. While most approaches use heuristics, biology suggests that there must exist an unsupervised method to construct such filters automatically. Besides the extraction of environmental states, the filters have to represent them in a fashion that support modern reinforcement algorithms. Many popular algorithms use a linear architecture, so one should aim at filters that have good approximation properties in combination with linear functions. This thesis wants to propose the unsupervised method slow feature analysis (SFA) for this task. Presented with a random sequence of sensor readings, SFA learns a set of filters. With growing model complexity and training examples, the filters converge against trigonometric polynomial functions. These are known to possess excellent approximation capabilities and should therfore support the reinforcement algorithms well. We evaluate this claim on a robot. The task is to learn a navigational control in a simple environment using the least square policy iteration (LSPI) algorithm. The only accessible sensor is a head mounted video camera, but without meaningful filtering, video images are not suited as LSPI input. We will show that filters learned by SFA, based on a random walk video of the robot, allow the learned control to navigate successfully in ca. 80% of the test trials.

View on arXiv PDF

Similar