CVMay 13, 2019

Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning

arXiv:1905.05212v123 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient depth estimation in robotics and virtual reality on low-end devices, though it is incremental as it builds on existing pruning techniques.

The paper tackled the problem of high memory and computing requirements in monocular depth estimation models by proposing a joint end-to-end filter pruning method to create a lightweight model from a large trained one, achieving around 5x compression with a small drop in accuracy on the KITTI dataset.

Convolutional neural networks (CNNs) have emerged as the state-of-the-art in multiple vision tasks including depth estimation. However, memory and computing power requirements remain as challenges to be tackled in these models. Monocular depth estimation has significant use in robotics and virtual reality that requires deployment on low-end devices. Training a small model from scratch results in a significant drop in accuracy and it does not benefit from pre-trained large models. Motivated by the literature of model pruning, we propose a lightweight monocular depth model obtained from a large trained model. This is achieved by removing the least important features with a novel joint end-to-end filter pruning. We propose to learn a binary mask for each filter to decide whether to drop the filter or not. These masks are trained jointly to exploit relations between filters at different layers as well as redundancy within the same layer. We show that we can achieve around 5x compression rate with small drop in accuracy on the KITTI driving dataset. We also show that masking can improve accuracy over the baseline with fewer parameters, even without enforcing compression loss.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes