Autoregressive Flow Matching for Motion Prediction
This work addresses motion prediction for robotics and human motion applications, presenting a novel method with demonstrated improvements in downstream tasks.
The paper tackles motion prediction by developing autoregressive flow matching (ARFM) to model sequential continuous data from diverse video datasets, demonstrating that conditioning robot action and human motion prediction on predicted future tracks significantly improves downstream task performance.
Motion prediction has been studied in different contexts with models trained on narrow distributions and applied to downstream tasks in human motion prediction and robotics. Simultaneously, recent efforts in scaling video prediction have demonstrated impressive visual realism, yet they struggle to accurately model complex motions despite massive scale. Inspired by the scaling of video generation, we develop autoregressive flow matching (ARFM), a new method for probabilistic modeling of sequential continuous data and train it on diverse video datasets to generate future point track locations over long horizons. To evaluate our model, we develop benchmarks for evaluating the ability of motion prediction models to predict human and robot motion. Our model is able to predict complex motions, and we demonstrate that conditioning robot action prediction and human motion prediction on predicted future tracks can significantly improve downstream task performance. Code and models publicly available at: https://github.com/Johnathan-Xie/arfm-motion-prediction.