CVDec 2, 2021

Attention based Occlusion Removal for Hybrid Telepresence Systems

arXiv:2112.01098v14 citations
Originality Incremental advance
AI Analysis

This addresses the lack of immersiveness in VR telepresence systems by enabling better facial representation, though it is incremental as it builds on existing animation and 3D face reconstruction pipelines.

The paper tackles the problem of HMDs blocking facial expressions in VR telepresence by proposing an attention-based encoder-decoder architecture for de-occlusion, achieving superior qualitative and quantitative results over state-of-the-art methods.

Traditionally, video conferencing is a widely adopted solution for telecommunication, but a lack of immersiveness comes inherently due to the 2D nature of facial representation. The integration of Virtual Reality (VR) in a communication/telepresence system through Head Mounted Displays (HMDs) promises to provide users a much better immersive experience. However, HMDs cause hindrance by blocking the facial appearance and expressions of the user. To overcome these issues, we propose a novel attention-enabled encoder-decoder architecture for HMD de-occlusion. We also propose to train our person-specific model using short videos (1-2 minutes) of the user, captured in varying appearances, and demonstrated generalization to unseen poses and appearances of the user. We report superior qualitative and quantitative results over state-of-the-art methods. We also present applications of this approach to hybrid video teleconferencing using existing animation and 3D face reconstruction pipelines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes