LGMLJan 30, 2023

Massively Scaling Heteroscedastic Classifiers

arXiv:2301.12860v111 citationsh-index: 52
Originality Incremental advance
AI Analysis

This addresses the scalability issue for heteroscedastic classifiers in large-scale image classification, enabling broader application in domains like multimodal learning, though it is incremental as it builds on existing heteroscedastic methods.

The paper tackled the problem of scaling heteroscedastic classifiers to large datasets by proposing HET-XL, which reduces parameter scaling from linear to independent of class count and eliminates the need for hyperparameter tuning, achieving 14x fewer parameters and consistent performance improvements on datasets with up to 4B images and 30k classes.

Heteroscedastic classifiers, which learn a multivariate Gaussian distribution over prediction logits, have been shown to perform well on image classification problems with hundreds to thousands of classes. However, compared to standard classifiers, they introduce extra parameters that scale linearly with the number of classes. This makes them infeasible to apply to larger-scale problems. In addition heteroscedastic classifiers introduce a critical temperature hyperparameter which must be tuned. We propose HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes. In our large-scale settings, we show that we can remove the need to tune the temperature hyperparameter, by directly learning it on the training data. On large image classification datasets with up to 4B images and 30k classes our method requires 14X fewer additional parameters, does not require tuning the temperature on a held-out set and performs consistently better than the baseline heteroscedastic classifier. HET-XL improves ImageNet 0-shot classification in a multimodal contrastive learning setup which can be viewed as a 3.5 billion class classification problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes