CVHCFeb 28, 2020

Hand-Priming in Object Localization for Assistive Egocentric Vision

arXiv:2002.12557v120 citations
AI Analysis

This addresses the challenge of object localization in assistive technology for people with visual impairments, representing an incremental improvement by leveraging hand interactions as a reliable cue.

The paper tackles the problem of identifying objects of interest in egocentric vision for visually impaired users by proposing hand-priming models that use hand presence as contextual information to localize objects, achieving higher precision than other methods like fine-tuning or multi-task learning.

Egocentric vision holds great promises for increasing access to visual information and improving the quality of life for people with visual impairments, with object recognition being one of the daily challenges for this population. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users often tend to include their hand either interacting with the object that they wish to recognize or simply placing it in proximity for better camera aiming. We propose localization models that leverage the presence of the hand as the contextual information for priming the center area of the object of interest. In our approach, hand segmentation is fed to either the entire localization network or its last convolutional layers. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves higher precision than other approaches, such as fine-tuning, multi-class, and multi-task learning, which also encode hand-object interactions in localization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes