FlowNet3D: Learning Scene Flow in 3D Point Clouds
This addresses the need for 3D motion understanding in robotics and human-computer interaction, offering a novel method for a known bottleneck in scene flow estimation from point clouds.
The paper tackles the problem of estimating scene flow directly from 3D point clouds, proposing FlowNet3D, a deep neural network that learns scene flow end-to-end and outperforms baselines on synthetic and real datasets like FlyingThings3D and KITTI.
Many applications in robotics and human-computer interaction can benefit from understanding 3D motion of points in a dynamic environment, widely noted as scene flow. While most previous methods focus on stereo and RGB-D images as input, few try to estimate scene flow directly from point clouds. In this work, we propose a novel deep neural network named $FlowNet3D$ that learns scene flow from point clouds in an end-to-end fashion. Our network simultaneously learns deep hierarchical features of point clouds and flow embeddings that represent point motions, supported by two newly proposed learning layers for point sets. We evaluate the network on both challenging synthetic data from FlyingThings3D and real Lidar scans from KITTI. Trained on synthetic data only, our network successfully generalizes to real scans, outperforming various baselines and showing competitive results to the prior art. We also demonstrate two applications of our scene flow output (scan registration and motion segmentation) to show its potential wide use cases.