ROAILGJan 24, 2025

ABPT: Amended Backpropagation through Time with Partially Differentiable Rewards

arXiv:2501.14513v23 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This addresses a specific challenge in reinforcement learning for robotics, offering incremental improvements for quadrotor control with partially differentiable rewards.

The paper tackles the problem of training quadrotor control policies with partially differentiable rewards, which causes biased gradients and degraded performance, by proposing ABPT, a method that combines 0-step and N-step returns to reduce bias and achieves faster convergence and higher rewards in flight tasks.

Quadrotor control policies can be trained with high performance using the exact gradients of the rewards to directly optimize policy parameters via backpropagation-through-time (BPTT). However, designing a fully differentiable reward architecture is often challenging. Partially differentiable rewards will result in biased gradient propagation that degrades training performance. To overcome this limitation, we propose Amended Backpropagation-through-Time (ABPT), a novel approach that mitigates gradient bias while preserving the training efficiency of BPTT. ABPT combines 0-step and N-step returns, effectively reducing the bias by leveraging value gradients from the learned Q-value function. Additionally, it adopts entropy regularization and state initialization mechanisms to encourage exploration during training. We evaluate ABPT on four representative quadrotor flight tasks \li{in both real world and simulation}. Experimental results demonstrate that ABPT converges significantly faster and achieves higher ultimate rewards than existing learning algorithms, particularly in tasks involving partially differentiable rewards. The code will be released at http://github.com/Fanxing-LI/ABPT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes