LGMay 29, 2022

Long-Tailed Learning Requires Feature Learning

Thomas Laurent, James H. von Brecht, Xavier Bresson

arXiv:2205.14553v35.82 citationsh-index: 45

Originality Incremental advance

AI Analysis

This addresses the challenge of learning from imbalanced data, which is common in domains like text and images, but the approach is incremental as it builds on existing long-tailed learning frameworks.

The paper tackles the problem of generalization in long-tailed learning by proposing a data model and showing that success depends on identifying correct features, with derived non-asymptotic error bounds quantifying the penalty for not learning features.

We propose a simple data model inspired from natural data such as text or images, and use it to study the importance of learning features in order to achieve good generalization. Our data model follows a long-tailed distribution in the sense that some rare subcategories have few representatives in the training set. In this context we provide evidence that a learner succeeds if and only if it identifies the correct features, and moreover derive non-asymptotic generalization error bounds that precisely quantify the penalty that one must pay for not learning features.

View on arXiv PDF

Similar