Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze
This work addresses the problem of activity recognition and video retrieval for egocentric video analysis, presenting an incremental method that builds on existing gaze-based approaches.
The paper tackles unsupervised segmentation of action segments in egocentric videos by using gaze to identify regions-of-interest and tracking motion parameters to find temporal cuts, with results evaluated on the BRISGAZE-ACTIONS dataset and improved using entropy measures.
Unsupervised segmentation of action segments in egocentric videos is a desirable feature in tasks such as activity recognition and content-based video retrieval. Reducing the search space into a finite set of action segments facilitates a faster and less noisy matching. However, there exist a substantial gap in machine understanding of natural temporal cuts during a continuous human activity. This work reports on a novel gaze-based approach for segmenting action segments in videos captured using an egocentric camera. Gaze is used to locate the region-of-interest inside a frame. By tracking two simple motion-based parameters inside successive regions-of-interest, we discover a finite set of temporal cuts. We present several results using combinations (of the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains egocentric videos depicting several daily-living activities. The quality of the temporal cuts is further improved by implementing two entropy measures.