ROAILGSYDSMar 16, 2022

Multiscale Sensor Fusion and Continuous Control with Neural CDEs

arXiv:2203.08715v15 citationsh-index: 51
Originality Highly original
AI Analysis

This addresses the challenge of asynchronous sensing in robotics, enabling more efficient and effective end-to-end visuomotor control for physical robots.

The paper tackled the problem of robot learning requiring near-continuous multiscale feedback control by proposing InFuser, a unified architecture using Neural Controlled Differential Equations to train continuous-time policies that integrate and fuse multi-sensory observations at different frequencies. It demonstrated that InFuser learns robust policies for dynamic tasks, notably outperforming several baselines in behavior cloning experiments.

Though robot learning is often formulated in terms of discrete-time Markov decision processes (MDPs), physical robots require near-continuous multiscale feedback control. Machines operate on multiple asynchronous sensing modalities, each with different frequencies, e.g., video frames at 30Hz, proprioceptive state at 100Hz, force-torque data at 500Hz, etc. While the classic approach is to batch observations into fixed-time windows then pass them through feed-forward encoders (e.g., with deep networks), we show that there exists a more elegant approach -- one that treats policy learning as modeling latent state dynamics in continuous-time. Specifically, we present 'InFuser', a unified architecture that trains continuous time-policies with Neural Controlled Differential Equations (CDEs). InFuser evolves a single latent state representation over time by (In)tegrating and (Fus)ing multi-sensory observations (arriving at different frequencies), and inferring actions in continuous-time. This enables policies that can react to multi-frequency multi sensory feedback for truly end-to-end visuomotor control, without discrete-time assumptions. Behavior cloning experiments demonstrate that InFuser learns robust policies for dynamic tasks (e.g., swinging a ball into a cup) notably outperforming several baselines in settings where observations from one sensing modality can arrive at much sparser intervals than others.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes