CVAIApr 22, 2024

Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification

CMU
arXiv:2404.14606v232 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses a practical problem for computer vision applications in public health and security, but it is incremental as it builds on existing transformer architectures.

The paper tackles the challenge of facial expression recognition when masks are worn by proposing a unified multi-branch vision transformer that handles both facial expression and mask classification tasks, achieving performance comparable to or better than state-of-the-art methods.

With wearing masks becoming a new cultural norm, facial expression recognition (FER) while taking masks into account has become a significant challenge. In this paper, we propose a unified multi-branch vision transformer for facial expression recognition and mask wearing classification tasks. Our approach extracts shared features for both tasks using a dual-branch architecture that obtains multi-scale feature representations. Furthermore, we propose a cross-task fusion phase that processes tokens for each task with separate branches, while exchanging information using a cross attention module. Our proposed framework reduces the overall complexity compared with using separate networks for both tasks by the simple yet effective cross-task fusion phase. Extensive experiments demonstrate that our proposed model performs better than or on par with different state-of-the-art methods on both facial expression recognition and facial mask wearing classification task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes