Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching
This addresses practical deployment issues in computer vision for applications like robotics or autonomous driving, though it is incremental as it builds on existing deep stereo methods.
The paper tackles the problem of deep stereo matching being impractical for applications due to high memory usage and fixed disparity ranges, proposing the Practical Deep Stereo (PDS) network that reduces memory footprint and handles any disparity range without retraining, achieving superior performance on FlyingThings3D and KITTI datasets.
End-to-end deep-learning networks recently demonstrated extremely good perfor- mance for stereo matching. However, existing networks are difficult to use for practical applications since (1) they are memory-hungry and unable to process even modest-size images, (2) they have to be trained for a given disparity range. The Practical Deep Stereo (PDS) network that we propose addresses both issues: First, its architecture relies on novel bottleneck modules that drastically reduce the memory footprint in inference, and additional design choices allow to handle greater image size during training. This results in a model that leverages large image context to resolve matching ambiguities. Second, a novel sub-pixel cross- entropy loss combined with a MAP estimator make this network less sensitive to ambiguous matches, and applicable to any disparity range without re-training. We compare PDS to state-of-the-art methods published over the recent months, and demonstrate its superior performance on FlyingThings3D and KITTI sets.