ROAug 11, 2020

Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

arXiv:2008.04460v263 citations
AI Analysis

This addresses the challenge of designing adaptable robotic systems for robotics researchers, though it is incremental by extending RL to hardware optimization.

The paper tackles the problem of jointly optimizing robot hardware and control parameters using deep reinforcement learning, modeling hardware as a policy, and demonstrates its effectiveness with a physical prototype of an underactuated hand, achieving results comparable to traditional co-optimization methods.

Deep Reinforcement Learning (RL) has shown great success in learning complex control policies for a variety of applications in robotics. However, in most such cases, the hardware of the robot has been considered immutable, modeled as part of the environment. In this study, we explore the problem of learning hardware and control parameters together in a unified RL framework. To achieve this, we propose to model the robot body as a "hardware policy", analogous to and optimized jointly with its computational counterpart. We show that, by modeling such hardware policies as auto-differentiable computational graphs, the ensuing optimization problem can be solved efficiently by gradient-based algorithms from the Policy Optimization family. We present two such design examples: a toy mass-spring problem, and a real-world problem of designing an underactuated hand. We compare our method against traditional co-optimization approaches, and also demonstrate its effectiveness by building a physical prototype based on the learned hardware parameters. Videos and more details are available at https://roamlab.github.io/hwasp/ .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes