Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring
For SLAM researchers and practitioners, LeanGate offers a practical solution to the computational bottleneck of dense geometric decoding in GFM-based systems, enabling real-time deployment without sacrificing accuracy.
LeanGate reduces computational redundancy in GFM-based monocular SLAM by predicting geometric utility scores to skip over 90% of redundant frames, achieving >85% FLOPs reduction and 5x throughput speedup while maintaining accuracy.
Geometric Foundation Models (GFMs) have recently advanced monocular SLAM by providing robust, calibration-free 3D priors. However, deploying these models on dense video streams introduces significant computational redundancy. Current GFM-based SLAM systems typically rely on post hoc keyframe selection. Because of this, they must perform expensive dense geometric decoding simply to determine whether a frame contains novel geometry, resulting in late rejection and wasted computation. To mitigate this inefficiency, we propose LeanGate, a lightweight feed-forward frame-gating network. LeanGate predicts a geometric utility score to assess a frame's mapping value prior to the heavy GFM feature extraction and matching stages. As a predictive plug-and-play module, our approach bypasses over 90% of redundant frames. Evaluations on standard SLAM benchmarks demonstrate that LeanGate reduces tracking FLOPs by more than 85% and achieves a 5x end-to-end throughput speedup. Furthermore, it maintains the tracking and mapping accuracy of dense baselines.