CVDec 26, 2022

PMODE: Prototypical Mask based Object Dimension Estimation

arXiv:2212.13281v1h-index: 11
Originality Incremental advance
AI Analysis

This addresses the challenge of real-time object dimension estimation in uncontrolled environments, though it is incremental as it builds on existing segmentation and regression techniques.

The paper tackles the problem of estimating object dimensions from monocular video without camera calibration or handcrafted features, achieving a 22% MAPE on test data.

Can a neural network estimate an object's dimension in the wild? In this paper, we propose a method and deep learning architecture to estimate the dimensions of a quadrilateral object of interest in videos using a monocular camera. The proposed technique does not use camera calibration or handcrafted geometric features; however, features are learned with the help of coefficients of a segmentation neural network during the training process. A real-time instance segmentation-based Deep Neural Network with a ResNet50 backbone is employed, giving the object's prototype mask and thus provides a region of interest to regress its dimensions. The instance segmentation network is trained to look at only the nearest object of interest. The regression is performed using an MLP head which looks only at the mask coefficients of the bounding box detector head and the prototype segmentation mask. We trained the system with three different random cameras achieving 22% MAPE for the test dataset for the dimension estimation

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes