Efficient stereo matching on embedded GPUs with zero-means cross correlation
This work addresses the need for efficient and accurate stereo matching in mobile applications like autonomous vehicles and robots, offering an incremental improvement in speed and accuracy for embedded systems.
The paper tackles the trade-off between accuracy and computational efficiency in stereo matching on embedded GPUs by proposing an acceleration method for zero-means normalized cross correlation (ZNCC) that uses zigzag scanning to reuse pixel computations, achieving a 2x speedup over traditional methods and enabling real-time processing at 32 fps on a Jetson Tx2 GPU for 1280x384 pixel images.
Mobile stereo-matching systems have become an important part of many applications, such as automated-driving vehicles and autonomous robots. Accurate stereo-matching methods usually lead to high computational complexity; however, mobile platforms have only limited hardware resources to keep their power consumption low; this makes it difficult to maintain both an acceptable processing speed and accuracy on mobile platforms. To resolve this trade-off, we herein propose a novel acceleration approach for the well-known zero-means normalized cross correlation (ZNCC) matching cost calculation algorithm on a Jetson Tx2 embedded GPU. In our method for accelerating ZNCC, target images are scanned in a zigzag fashion to efficiently reuse one pixel's computation for its neighboring pixels; this reduces the amount of data transmission and increases the utilization of on-chip registers, thus increasing the processing speed. As a result, our method is 2X faster than the traditional image scanning method, and 26% faster than the latest NCC method. By combining this technique with the domain transformation (DT) algorithm, our system show real-time processing speed of 32 fps, on a Jetson Tx2 GPU for 1,280x384 pixel images with a maximum disparity of 128. Additionally, the evaluation results on the KITTI 2015 benchmark show that our combined system is more accurate than the same algorithm combined with census by 7.26%, while maintaining almost the same processing speed. Source Code: https://github.com/changqiong/Z2ZNCC.git