CVNov 17, 2025

Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets

arXiv:2511.13944v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This addresses dataset reliability issues for researchers and practitioners in computer vision, but it is incremental as it builds on existing clustering and data splitting techniques.

The paper tackled the problem of information leakage in video-derived frames datasets by proposing a cluster-based frame selection strategy, resulting in more representative, balanced, and reliable dataset partitions.

We propose a cluster-based frame selection strategy to mitigate information leakage in video-derived frames datasets. By grouping visually similar frames before splitting into training, validation, and test sets, the method produces more representative, balanced, and reliable dataset partitions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes