CVApr 26, 2021

Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data

arXiv:2104.12673v378 citations
Originality Incremental advance
AI Analysis

It addresses the problem of discovering new categories in data with partial labels for researchers in machine learning and computer vision, representing an incremental improvement over existing methods.

The paper tackles novel category discovery on single- and multi-modal data by proposing an end-to-end framework that jointly learns representations and assigns clusters to unlabelled data, achieving state-of-the-art results on benchmarks like Kinetics-400, VGG-Sound, CIFAR10, CIFAR100, and ImageNet.

This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories. We present a generic, end-to-end framework to jointly learn a reliable representation and assign clusters to unlabelled data. To avoid over-fitting the learnt embedding to labelled data, we take inspiration from self-supervised representation learning by noise-contrastive estimation and extend it to jointly handle labelled and unlabelled data. In particular, we propose using category discrimination on labelled data and cross-modal discrimination on multi-modal data to augment instance discrimination used in conventional contrastive learning approaches. We further employ Winner-Take-All (WTA) hashing algorithm on the shared representation space to generate pairwise pseudo labels for unlabelled data to better predict cluster assignments. We thoroughly evaluate our framework on large-scale multi-modal video benchmarks Kinetics-400 and VGG-Sound, and image benchmarks CIFAR10, CIFAR100 and ImageNet, obtaining state-of-the-art results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes