Left/Right Hand Segmentation in Egocentric Videos
This work addresses hand segmentation for applications in user-machine interaction using wearable cameras, but it is incremental as it builds on existing methods.
The paper tackles the problem of segmenting left and right hands in egocentric videos, which is complicated by hand-to-hand occlusions and interactions, by extending traditional background-foreground methods with a hand-identification step and temporal superpixels, resulting in improved segmentation reliability.
Wearable cameras allow people to record their daily activities from a user-centered (First Person Vision) perspective. Due to their favorable location, wearable cameras frequently capture the hands of the user, and may thus represent a promising user-machine interaction tool for different applications. Existent First Person Vision methods handle hand segmentation as a background-foreground problem, ignoring two important facts: i) hands are not a single "skin-like" moving element, but a pair of interacting cooperative entities, ii) close hand interactions may lead to hand-to-hand occlusions and, as a consequence, create a single hand-like segment. These facts complicate a proper understanding of hand movements and interactions. Our approach extends traditional background-foreground strategies, by including a hand-identification step (left-right) based on a Maxwell distribution of angle and position. Hand-to-hand occlusions are addressed by exploiting temporal superpixels. The experimental results show that, in addition to a reliable left/right hand-segmentation, our approach considerably improves the traditional background-foreground hand-segmentation.