CVFeb 14, 2024

Headset: Human emotion awareness under partial occlusions multimodal dataset

arXiv:2402.09107v17 citationsh-index: 5IEEE Trans Vis Comput Graph
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for ethically compliant and diverse volumetric data for XR applications, but it is incremental as it builds on existing data collection methods.

The authors introduced HEADSET, a multimodal dataset with volumetric capture of 27 participants showing facial expressions and body movements, plus 11 wearing head-mounted displays, to advance immersive technologies. They evaluated it on tasks like facial expression classification and HMD removal, making it publicly available for research.

The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simultaneously. Finally, we also provide an evaluation of our dataset employment with regard to the tasks of facial expression classification, HMDs removal, and point cloud reconstruction. The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video. HEADSET and its all associated raw data and license agreement will be publicly available for research purposes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes