CV LGNov 15, 2024

Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses

arXiv:2411.10013v22.01 citationsh-index: 1Has CodeCVPR

Originality Incremental advance

AI Analysis

This work addresses real-time depth estimation for AR applications, offering significant performance gains but is incremental as it builds on existing stereo depth estimation techniques.

The paper tackles the problem of high latency in stereo depth estimation for AR glasses by developing new methods to eliminate preprocessing and replace cost volume computations, resulting in models that improve accuracy by up to 30.3% and reduce latency by up to 44.5%.

Stereo depth estimation is a fundamental component in augmented reality (AR), which requires low latency for real-time processing. However, preprocessing such as rectification and non-ML computations such as cost volume require significant amount of latency exceeding that of an ML model itself, which hinders the real-time processing required by AR. Therefore, we develop alternative approaches to the rectification and cost volume that consider ML acceleration (GPU and NPUs) in recent hardware. For pre-processing, we eliminate it by introducing homography matrix prediction network with a rectification positional encoding (RPE), which delivers both low latency and robustness to unrectified images. For cost volume, we replace it with a group-pointwise convolution-based operator and approximation of cosine similarity based on layernorm and dot product. Based on our approaches, we develop MultiHeadDepth (replacing cost volume) and HomoDepth (MultiHeadDepth + removing pre-processing) models. MultiHeadDepth provides 11.8-30.3% improvements in accuracy and 22.9-25.2% reduction in latency compared to a state-of-the-art depth estimation model for AR glasses from industry. HomoDepth, which can directly process unrectified images, reduces the end-to-end latency by 44.5%. We also introduce a multi-task learning method to handle misaligned stereo inputs on HomoDepth, which reduces the AbsRel error by 10.0-24.3%. The overall results demonstrate the efficacy of our approaches, which not only reduce the inference latency but also improve the model performance. Our code is available at https://github.com/UCI-ISA-Lab/MultiHeadDepth-HomoDepth

View on arXiv PDF Code

Similar