CVAIJun 3, 2019

How Much Does Audio Matter to Recognize Egocentric Object Interactions?

arXiv:1906.00634v16 citations
Originality Incremental advance
AI Analysis

This work addresses the under-explored use of audio for egocentric action recognition, which could benefit applications in assistive technologies or human-computer interaction, though it is incremental as it builds on existing benchmarks.

The paper tackled the problem of recognizing egocentric object interactions by proposing an audio-only model, achieving a competitive verb classification accuracy of 34.26% on a standard benchmark compared to vision-based systems with a lighter architecture.

Sounds are an important source of information on our daily interactions with objects. For instance, a significant amount of people can discern the temperature of water that it is being poured just by using the sense of hearing. However, only a few works have explored the use of audio for the classification of object interactions in conjunction with vision or as single modality. In this preliminary work, we propose an audio model for egocentric action recognition and explore its usefulness on the parts of the problem (noun, verb, and action classification). Our model achieves a competitive result in terms of verb classification (34.26% accuracy) on a standard benchmark with respect to vision-based state of the art systems, using a comparatively lighter architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes