CVOct 2, 2023

Segmenting the motion components of a video: A long-term unsupervised model

arXiv:2310.01040v31 citationsh-index: 46
Originality Incremental advance
AI Analysis

This work addresses video analysis for computer vision applications, but it is incremental as it builds on existing motion segmentation methods with a focus on temporal consistency.

The paper tackles the problem of unsupervised motion segmentation in videos by proposing a long-term spatio-temporal model that processes optical flow fields to output coherent motion segments, achieving competitive results on four VOS benchmarks.

Human beings have the ability to continuously analyze a video and immediately extract the motion components. We want to adopt this paradigm to provide a coherent and stable motion segmentation over the video sequence. In this perspective, we propose a novel long-term spatio-temporal model operating in a totally unsupervised way. It takes as input the volume of consecutive optical flow (OF) fields, and delivers a volume of segments of coherent motion over the video. More specifically, we have designed a transformer-based network, where we leverage a mathematically well-founded framework, the Evidence Lower Bound (ELBO), to derive the loss function. The loss function combines a flow reconstruction term involving spatio-temporal parametric motion models combining, in a novel way, polynomial (quadratic) motion models for the spatial dimensions and B-splines for the time dimension of the video sequence, and a regularization term enforcing temporal consistency on the segments. We report experiments on four VOS benchmarks, demonstrating competitive quantitative results, while performing motion segmentation on a whole sequence in one go. We also highlight through visual results the key contributions on temporal consistency brought by our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes