CVJul 4, 2025

CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation

arXiv:2507.03539v23 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses unsupervised action segmentation in videos, offering a novel method to enhance segmentation accuracy without segment-level supervision, though it is incremental over prior OT-based approaches.

The paper tackles the problem of unsupervised action segmentation by proposing CLOT, a closed-loop optimal transport framework that improves segmentation through multi-level cyclic feature learning, achieving state-of-the-art results on four benchmark datasets.

Unsupervised action segmentation has recently pushed its limits with ASOT, an optimal transport (OT)-based method that simultaneously learns action representations and performs clustering using pseudo-labels. Unlike other OT-based approaches, ASOT makes no assumptions about action ordering and can decode a temporally consistent segmentation from a noisy cost matrix between video frames and action labels. However, the resulting segmentation lacks segment-level supervision, limiting the effectiveness of feedback between frames and action representations. To address this limitation, we propose Closed Loop Optimal Transport (CLOT), a novel OT-based framework with a multi-level cyclic feature learning mechanism. Leveraging its encoder-decoder architecture, CLOT learns pseudo-labels alongside frame and segment embeddings by solving two separate OT problems. It then refines both frame embeddings and pseudo-labels through cross-attention between the learned frame and segment embeddings, by integrating a third OT problem. Experimental results on four benchmark datasets demonstrate the benefits of cyclical learning for unsupervised action segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes