BANet: Bilateral Aggregation Network for Mobile Stereo Matching
This work addresses the challenge of efficient stereo matching for mobile applications, offering a mobile-friendly solution that improves accuracy and speed.
The paper tackles the problem of deploying stereo matching on mobile devices by proposing BANet, a bilateral aggregation network that uses only 2D convolutions to achieve high-quality results with sharp edges and fine details, outperforming MobileStereoNet-2D by 35.3% in accuracy on KITTI 2015 with faster runtime.
State-of-the-art stereo matching methods typically use costly 3D convolutions to aggregate a full cost volume, but their computational demands make mobile deployment challenging. Directly applying 2D convolutions for cost aggregation often results in edge blurring, detail loss, and mismatches in textureless regions. Some complex operations, like deformable convolutions and iterative warping, can partially alleviate this issue; however, they are not mobile-friendly, limiting their deployment on mobile devices. In this paper, we present a novel bilateral aggregation network (BANet) for mobile stereo matching that produces high-quality results with sharp edges and fine details using only 2D convolutions. Specifically, we first separate the full cost volume into detailed and smooth volumes using a spatial attention map, then perform detailed and smooth aggregations accordingly, ultimately fusing both to obtain the final disparity map. Experimental results demonstrate that our BANet-2D significantly outperforms other mobile-friendly methods, achieving 35.3\% higher accuracy on the KITTI 2015 leaderboard than MobileStereoNet-2D, with faster runtime on mobile devices. Code: \textcolor{magenta}{https://github.com/gangweix/BANet}.