Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets
This addresses dataset reliability issues for researchers and practitioners in computer vision, but it is incremental as it builds on existing clustering and data splitting techniques.
The paper tackled the problem of information leakage in video-derived frames datasets by proposing a cluster-based frame selection strategy, resulting in more representative, balanced, and reliable dataset partitions.
We propose a cluster-based frame selection strategy to mitigate information leakage in video-derived frames datasets. By grouping visually similar frames before splitting into training, validation, and test sets, the method produces more representative, balanced, and reliable dataset partitions.