CVMar 21, 2022

Generative Adversarial Network for Future Hand Segmentation from Egocentric Video

arXiv:2203.11305v220 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of anticipating hand movements in wearable camera systems, which is incremental as it builds on existing segmentation and GAN techniques for a specific application.

The paper tackles the problem of predicting future hand masks from egocentric video by modeling stochastic head motions, achieving more accurate predictions compared to previous state-of-the-art methods on datasets like EPIC-Kitchens and EGTEA Gaze+.

We introduce the novel problem of anticipating a time series of future hand masks from egocentric video. A key challenge is to model the stochasticity of future head motions, which globally impact the head-worn camera video analysis. To this end, we propose a novel deep generative model -- EgoGAN, which uses a 3D Fully Convolutional Network to learn a spatio-temporal video representation for pixel-wise visual anticipation, generates future head motion using Generative Adversarial Network (GAN), and then predicts the future hand masks based on the video representation and the generated future head motion. We evaluate our method on both the EPIC-Kitchens and the EGTEA Gaze+ datasets. We conduct detailed ablation studies to validate the design choices of our approach. Furthermore, we compare our method with previous state-of-the-art methods on future image segmentation and show that our method can more accurately predict future hand masks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes