FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
This work addresses a critical problem for applications like autonomous driving and AR/VR by improving scene flow estimation from point clouds, though it appears incremental as it builds on existing methods with novel attention mechanisms.
The paper tackles the challenge of estimating scene flow from sparse and irregular point clouds by proposing a novel Spatial Abstraction with Attention (SA^2) layer to address unstable abstraction and a Temporal Abstraction with Attention (TA^2) layer to handle larger motion ranges, resulting in significant performance gains compared to state-of-the-art benchmarks.
Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked new research in 3D scene flow. Nevertheless, it remains challenging to extract scene flow from point clouds due to the sparsity and irregularity in typical point cloud sampling patterns. One major issue related to irregular sampling is identified as the randomness during point set abstraction/feature extraction -- an elementary process in many flow estimation scenarios. A novel Spatial Abstraction with Attention (SA^2) layer is accordingly proposed to alleviate the unstable abstraction problem. Moreover, a Temporal Abstraction with Attention (TA^2) layer is proposed to rectify attention in temporal domain, leading to benefits with motions scaled in a larger range. Extensive analysis and experiments verified the motivation and significant performance gains of our method, dubbed as Flow Estimation via Spatial-Temporal Attention (FESTA), when compared to several state-of-the-art benchmarks of scene flow estimation.