Enhancing temporal segmentation by nonlocal self-similarity
This work addresses event segmentation in egocentric photo-streams for computer vision applications, representing an incremental improvement over existing methods.
The paper tackled the problem of temporal segmentation of photo-streams by encoding long-range temporal dependencies using nonlocal self-similarity functions, resulting in an average F-measure increase of 3.71% over the state of the art on the EDUB-Seg dataset.
Temporal segmentation of untrimmed videos and photo-streams is currently an active area of research in computer vision and image processing. This paper proposes a new approach to improve the temporal segmentation of photo-streams. The method consists in enhancing image representations by encoding long-range temporal dependencies. Our key contribution is to take advantage of the temporal stationarity assumption of photostreams for modeling each frame by its nonlocal self-similarity function. The proposed approach is put to test on the EDUB-Seg dataset, a standard benchmark for egocentric photostream temporal segmentation. Starting from seven different (CNN based) image features, the method yields consistent improvements in event segmentation quality, leading to an average increase of F-measure of 3.71% with respect to the state of the art.