CVMMSep 5, 2023

Prototype-based Dataset Comparison

arXiv:2309.02401v111 citationsh-index: 14Has Code
AI Analysis

This work addresses dataset inspection for researchers by enabling comparative analysis, though it appears incremental as it builds on existing summarization paradigms.

The paper tackles the limitation of single-dataset summarization by proposing a comparative approach using concept-level prototypes learned via self-supervised learning, enabling richer dataset inspection beyond prominent concepts, as demonstrated in two case-studies.

Dataset summarisation is a fruitful approach to dataset inspection. However, when applied to a single dataset the discovery of visual concepts is restricted to those most prominent. We argue that a comparative approach can expand upon this paradigm to enable richer forms of dataset inspection that go beyond the most prominent concepts. To enable dataset comparison we present a module that learns concept-level prototypes across datasets. We leverage self-supervised learning to discover these prototypes without supervision, and we demonstrate the benefits of our approach in two case-studies. Our findings show that dataset comparison extends dataset inspection and we hope to encourage more works in this direction. Code and usage instructions available at https://github.com/Nanne/ProtoSim

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes