SDASMar 30, 2021

Audio classification of the content of food containers and drinking glasses

arXiv:2103.15999v29 citations
Originality Synthesis-oriented
AI Analysis

This addresses a specific problem in audio-based object recognition for applications like assistive technology or robotics, but it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of classifying the type and amount of content in food containers and drinking glasses using sound, by decomposing it into action recognition and content classification steps. The proposed model achieves weighted average F1 scores of 76.02, 78.24, and 41.89 on test sets, outperforming baselines.

Food containers, drinking glasses and cups handled by a person generate sounds that vary with the type and amount of their content. In this paper, we propose a new model for sound-based classification of the type and amount of content in a container. The proposed model is based on the decomposition of the problem into two steps, namely action recognition and content classification. We use the scenario of the recent CORSMAL Containers Manipulation dataset and consider two actions (shaking and pouring), and seven combinations of material and filling level. The first step identifies the action performed by a person with the container. The second step determines the amount and type of content using an action-specific classifier. Experiments show that the proposed model achieves 76.02, 78.24, and 41.89 weighted average F1 score on the three test sets, respectively, and outperforms baselines and existing approaches that classify the content amount and type either independently or jointly.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes