CVLGMar 23, 2023

Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data

IBMMIT
arXiv:2303.13664v166 citationsh-index: 137
AI Analysis

This work addresses the challenge of applying self-supervised learning to real-world long-tail data distributions, offering an incremental improvement with a simple hyperparameter adjustment.

The paper tackles the problem of self-supervised contrastive learning on long-tail data by analyzing the temperature parameter's role and proposing a dynamic cosine schedule, which improves class separation without extra computational cost, achieving consistent gains in representation quality.

Most approaches for self-supervised learning (SSL) are optimised on curated balanced datasets, e.g. ImageNet, despite the fact that natural data usually exhibits long-tail distributions. In this paper, we analyse the behaviour of one of the most popular variants of SSL, i.e. contrastive methods, on long-tail data. In particular, we investigate the role of the temperature parameter $τ$ in the contrastive loss, by analysing the loss through the lens of average distance maximisation, and find that a large $τ$ emphasises group-wise discrimination, whereas a small $τ$ leads to a higher degree of instance discrimination. While $τ$ has thus far been treated exclusively as a constant hyperparameter, in this work, we propose to employ a dynamic $τ$ and show that a simple cosine schedule can yield significant improvements in the learnt representations. Such a schedule results in a constant `task switching' between an emphasis on instance discrimination and group-wise discrimination and thereby ensures that the model learns both group-wise features, as well as instance-specific details. Since frequent classes benefit from the former, while infrequent classes require the latter, we find this method to consistently improve separation between the classes in long-tail data without any additional computational cost.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes