CVLGMLJun 29, 2018

It All Matters: Reporting Accuracy, Inference Time and Power Consumption for Face Emotion Recognition on Embedded Systems

arXiv:1807.00046v1
Originality Synthesis-oriented
AI Analysis

It addresses the gap in reporting practical metrics for embedded systems, which is crucial for developers and researchers aiming to deploy emotion recognition in real-world human-computer interaction, though it is incremental in emphasizing existing metrics.

This paper tackles the problem of evaluating face emotion recognition systems for embedded environments by reporting not only detection accuracy but also inference time and power consumption, finding that most existing systems are limited by these factors in real-life applications.

While several approaches to face emotion recognition task are proposed in literature, none of them reports on power consumption nor inference time required to run the system in an embedded environment. Without adequate knowledge about these factors it is not clear whether we are actually able to provide accurate face emotion recognition in the embedded environment or not, and if not, how far we are from making it feasible and what are the biggest bottlenecks we face. The main goal of this paper is to answer these questions and to convey the message that instead of reporting only detection accuracy also power consumption and inference time should be reported as real usability of the proposed systems and their adoption in human computer interaction strongly depends on it. In this paper, we identify the state-of-the art face emotion recognition methods that are potentially suitable for embedded environment and the most frequently used datasets for this task. Our study shows that most of the performed experiments use datasets with posed expressions or in a particular experimental setup with special conditions for image collection. Since our goal is to evaluate the performance of the identified promising methods in the realistic scenario, we collect a new dataset with non-exaggerated emotions and we use it, in addition to the publicly available datasets, for the evaluation of detection accuracy, power consumption and inference time on three frequently used embedded devices with different computational capabilities. Our results show that gray images are still more suitable for embedded environment than color ones and that for most of the analyzed systems either inference time or energy consumption or both are limiting factor for their adoption in real-life embedded applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes