LGMay 29, 2022

Long-Tailed Learning Requires Feature Learning

arXiv:2205.14553v32 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the challenge of learning from imbalanced data, which is common in domains like text and images, but the approach is incremental as it builds on existing long-tailed learning frameworks.

The paper tackles the problem of generalization in long-tailed learning by proposing a data model and showing that success depends on identifying correct features, with derived non-asymptotic error bounds quantifying the penalty for not learning features.

We propose a simple data model inspired from natural data such as text or images, and use it to study the importance of learning features in order to achieve good generalization. Our data model follows a long-tailed distribution in the sense that some rare subcategories have few representatives in the training set. In this context we provide evidence that a learner succeeds if and only if it identifies the correct features, and moreover derive non-asymptotic generalization error bounds that precisely quantify the penalty that one must pay for not learning features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes