CVNov 9, 2023

Let's Get the FACS Straight -- Reconstructing Obstructed Facial Features

arXiv:2311.05221v25 citationsh-index: 28
Originality Incremental advance
AI Analysis

This addresses the need for robust facial analysis in applications like human-computer interaction or medical monitoring without requiring repeated fine-tuning for each task, though it is incremental as it adapts existing methods to a specific domain.

The paper tackles the problem of facial analysis when parts of the face are obstructed, such as by sensors, by reconstructing the obstructed features using a style transfer approach with CycleGAN, achieving scores similar to unobstructed videos in tasks like Facial Action Units and emotion detection.

The human face is one of the most crucial parts in interhuman communication. Even when parts of the face are hidden or obstructed the underlying facial movements can be understood. Machine learning approaches often fail in that regard due to the complexity of the facial structures. To alleviate this problem a common approach is to fine-tune a model for such a specific application. However, this is computational intensive and might have to be repeated for each desired analysis task. In this paper, we propose to reconstruct obstructed facial parts to avoid the task of repeated fine-tuning. As a result, existing facial analysis methods can be used without further changes with respect to the data. In our approach, the restoration of facial features is interpreted as a style transfer task between different recording setups. By using the CycleGAN architecture the requirement of matched pairs, which is often hard to fullfill, can be eliminated. To proof the viability of our approach, we compare our reconstructions with real unobstructed recordings. We created a novel data set in which 36 test subjects were recorded both with and without 62 surface electromyography sensors attached to their faces. In our evaluation, we feature typical facial analysis tasks, like the computation of Facial Action Units and the detection of emotions. To further assess the quality of the restoration, we also compare perceptional distances. We can show, that scores similar to the videos without obstructing sensors can be achieved.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes