SD ASMar 30, 2021

Audio classification of the content of food containers and drinking glasses

Santiago Donaher, Alessio Xompero, Andrea Cavallaro

arXiv:2103.15999v28.69 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses a specific problem in audio-based object recognition for applications like assistive technology or robotics, but it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of classifying the type and amount of content in food containers and drinking glasses using sound, by decomposing it into action recognition and content classification steps. The proposed model achieves weighted average F1 scores of 76.02, 78.24, and 41.89 on test sets, outperforming baselines.

Food containers, drinking glasses and cups handled by a person generate sounds that vary with the type and amount of their content. In this paper, we propose a new model for sound-based classification of the type and amount of content in a container. The proposed model is based on the decomposition of the problem into two steps, namely action recognition and content classification. We use the scenario of the recent CORSMAL Containers Manipulation dataset and consider two actions (shaking and pouring), and seven combinations of material and filling level. The first step identifies the action performed by a person with the container. The second step determines the amount and type of content using an action-specific classifier. Experiments show that the proposed model achieves 76.02, 78.24, and 41.89 weighted average F1 score on the three test sets, respectively, and outperforms baselines and existing approaches that classify the content amount and type either independently or jointly.

View on arXiv PDF Code

Similar