LGOct 22, 2022
Solving Continuous Control via Q-learningTim Seyde, Peter Werner, Wilko Schwarting et al. · deepmind
While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a simple modification of deep Q-learning largely alleviates these issues. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a variety of continuous control tasks.
ROJan 15
Approximately Optimal Global Planning for Contact-Rich SE(2) Manipulation on a Graph of Reachable SetsSimin Liu, Tong Zhao, Bernhard Paus Graesdal et al.
If we consider human manipulation, it is clear that contact-rich manipulation (CRM)-the ability to use any surface of the manipulator to make contact with objects-can be far more efficient and natural than relying solely on end-effectors (i.e., fingertips). However, state-of-the-art model-based planners for CRM are still focused on feasibility rather than optimality, limiting their ability to fully exploit CRM's advantages. We introduce a new paradigm that computes approximately optimal manipulator plans. This approach has two phases. Offline, we construct a graph of mutual reachable sets, where each set contains all object orientations reachable from a starting object orientation and grasp. Online, we plan over this graph, effectively computing and sequencing local plans for globally optimized motion. On a challenging, representative contact-rich task, our approach outperforms a leading planner, reducing task cost by 61%. It also achieves a 91% success rate across 250 queries and maintains sub-minute query times, ultimately demonstrating that globally optimized contact-rich manipulation is now practical for real-world tasks.
LGApr 5, 2024
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control ResolutionTim Seyde, Peter Werner, Wilko Schwarting et al. · deepmind
Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration characteristics while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency, but action costs can be detrimental to exploration during early training. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution, taking advantage of recent results in decoupled Q-learning to scale our approach to high-dimensional action spaces up to dim(A) = 38. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
ROSep 19, 2019
Vision-Based Proprioceptive Sensing for Soft Inflatable ActuatorsPeter Werner, Matthias Hofer, Carmelo Sferrazza et al.
This paper presents a vision-based sensing approach for a soft linear actuator, which is equipped with an integrated camera. The proposed vision-based sensing pipeline predicts the three-dimensional position of a point of interest on the actuator. To train and evaluate the algorithm, predictions are compared to ground truth data from an external motion capture system. An off-the-shelf distance sensor is integrated in a similar actuator and its performance is used as a baseline for comparison. The resulting sensing pipeline runs at 40 Hz in real-time on a standard laptop and is additionally used for closed loop elongation control of the actuator. It is shown that the approach can achieve comparable accuracy to the distance sensor.