LGAICVMLJan 30, 2022

Generalizing similarity in noisy setups: the DIBS phenomenon

arXiv:2201.12803v3
Originality Incremental advance
AI Analysis

This work addresses generalization challenges in similarity learning for machine learning practitioners, but it is incremental as it builds on existing noise and double descent studies.

The paper investigates how data density and label noise affect generalization in Siamese Neural Networks, revealing that dense datasets with Pair Label Noise cause worse generalization than Single Label Noise in overparametrized regions, a phenomenon termed Density-Induced Break of Similarity (DIBS).

This work uncovers an interplay among data density, noise, and the generalization ability in similarity learning. We consider Siamese Neural Networks (SNNs), which are the basic form of contrastive learning, and explore two types of noise that can impact SNNs, Pair Label Noise (PLN) and Single Label Noise (SLN). Our investigation reveals that SNNs exhibit double descent behaviour regardless of the training setup and that it is further exacerbated by noise. We demonstrate that the density of data pairs is crucial for generalization. When SNNs are trained on sparse datasets with the same amount of PLN or SLN, they exhibit comparable generalization properties. However, when using dense datasets, PLN cases generalize worse than SLN ones in the overparametrized region, leading to a phenomenon we call Density-Induced Break of Similarity (DIBS). In this regime, PLN similarity violation becomes macroscopical, corrupting the dataset to the point where complete interpolation cannot be achieved, regardless of the number of model parameters. Our analysis also delves into the correspondence between online optimization and offline generalization in similarity learning. The results show that this equivalence fails in the presence of label noise in all the scenarios considered.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes