ROMay 10

Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory

Changchuan Yang, Yuhang Dong, Guanzhong Tian, Haizhou Ge, Hongrui Zhu

arXiv:2504.0499112.5h-index: 4Has Code

Predicted impact top 38% in RO · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the need for efficient long-horizon visuomotor imitation learning in robotics by combining scale-domain action modeling with world-prior memory, offering a practical solution for embodied manipulation.

Wavelet Policy introduces a lightweight imitation learning framework that uses wavelet-based multi-scale action modeling and World Prior Memory to encode persistent scene structure, achieving consistent improvements over strong baselines on four simulated and six real-world robotic manipulation tasks.

Conventional visuomotor imitation learning usually predicts future robot actions directly in the time domain. Such formulations often have limited physical scene awareness and weak long-horizon memory. In contrast, world-model-based perception and memory-augmented policies can improve world awareness with substantial computation overhead. In this work, we propose Wavelet Policy, a lightweight imitation learning framework that combines World Prior Memory (WPM) with wavelet-based multi-scale action modeling. Our key idea is to encode persistent physical scene structure from static background images into compact memory tokens, which are fused into world-prior tokens and injected into the encoder during forward propagation. Based on this memory-conditioned representation, We further perform wavelet-domain decomposition over horizon-aligned latent action tokens and adopt a Single-Encoder Multiple-Decoder (SE2MD) architecture to model latent components at different temporal scales. The resulting latent subbands are reconstructed through inverse wavelet transform and finally projected into executable action chunks. To facilitate efficient world prior learning, we introduce a world-prior adaptation loss, encouraging the background encoder to retain persistent scene knowledge while remaining lightweight and stable. Extensive experiments on four simulated and six real-world robotic manipulation tasks show that Wavelet Policy consistently outperforms strong baselines. These results demonstrate that combining scale-domain action modeling with world-prior memory provides an effective and efficient solution for long-horizon embodied manipulation. We release the source code, data and model checkpoint of simulation task at https://github.com/lurenjia384/Wavelet_Policy.

View on arXiv PDF Code

Similar