LGMLJun 28, 2019

Continual Rare-Class Recognition with Emerging Novel Subclasses

arXiv:1906.12218v15 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of continual learning for rare-class recognition in domains like corporate-risk and disaster document analysis, offering an incremental improvement over existing methods.

The paper tackles the problem of recognizing rare classes in a continuous data stream, including emerging subclasses not seen during training, and demonstrates that RaRecognize outperforms state-of-the-art baselines on three real-world datasets.

Given a labeled dataset that contains a rare (or minority) class of of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? We introduce RaRecognize, which (i) estimates a general decision boundary between the rare and the majority class, (ii) learns to recognize individual rare subclasses that exist within the training data, as well as (iii) flags instances from previously unseen rare subclasses as newly emerging. The learner in (i) is general in the sense that by construction it is dissimilar to the specialized learners in (ii), thus distinguishes minority from the majority without overly tuning to what is seen in the training data. Thanks to this generality, RaRecognize ignores all future instances that it labels as majority and recognizes the recurrent as well as emerging rare subclasses only. This saves effort at test time as well as ensures that the model size grows moderately over time as it only maintains specialized minority learners. Through extensive experiments, we show that RaRecognize outperforms state-of-the art baselines on three real-world datasets that contain corporate-risk and disaster documents as rare classes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes