IIRC: Incremental Implicitly-Refined Classification
This work addresses the challenge of catastrophic forgetting and label inference for lifelong learning models in real-life scenarios where learners encounter entities at varying granularity levels, providing a new benchmark for practitioners.
The paper introduces the Incremental Implicitly-Refined Classification (IIRC) setup, which extends class incremental learning to include two granularity levels for labels (e.g., "bear" and "polar bear"). The model must infer the unprovided label if previously learned, and the authors develop a standardized benchmark to evaluate state-of-the-art lifelong learning algorithms in this new setup, revealing that distillation-based methods perform well but tend to predict too many labels per image.
We introduce the "Incremental Implicitly-Refined Classi-fication (IIRC)" setup, an extension to the class incremental learning setup where the incoming batches of classes have two granularity levels. i.e., each sample could have a high-level (coarse) label like "bear" and a low-level (fine) label like "polar bear". Only one label is provided at a time, and the model has to figure out the other label if it has already learnfed it. This setup is more aligned with real-life scenarios, where a learner usually interacts with the same family of entities multiple times, discovers more granularity about them, while still trying not to forget previous knowledge. Moreover, this setup enables evaluating models for some important lifelong learning challenges that cannot be easily addressed under the existing setups. These challenges can be motivated by the example "if a model was trained on the class bear in one task and on polar bear in another task, will it forget the concept of bear, will it rightfully infer that a polar bear is still a bear? and will it wrongfully associate the label of polar bear to other breeds of bear?". We develop a standardized benchmark that enables evaluating models on the IIRC setup. We evaluate several state-of-the-art lifelong learning algorithms and highlight their strengths and limitations. For example, distillation-based methods perform relatively well but are prone to incorrectly predicting too many labels per image. We hope that the proposed setup, along with the benchmark, would provide a meaningful problem setting to the practitioners