LGJun 16, 2022

Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective

Sungmin Cha, Jihwan Kwak, Dongsub Shim, Hyunwoo Kim, Moontae Lee, Honglak Lee, Taesup Moon

arXiv:2206.08101v36.94 citationsh-index: 28

Originality Incremental advance

AI Analysis

This work highlights a critical gap in CIL evaluation for researchers, proposing representation-level analysis to better assess algorithm performance beyond just accuracy.

The paper argues that current class incremental learning (CIL) evaluation focuses too much on test accuracy, overlooking representation quality, and finds that state-of-the-art algorithms often maintain high accuracy by learning classifiers similar to linear probes without improving representations, sometimes even degrading them compared to naive baselines.

Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data while not forgetting past learned classes. The common evaluation protocol for CIL algorithms is to measure the average test accuracy across all classes learned so far -- however, we argue that solely focusing on maximizing the test accuracy may not necessarily lead to developing a CIL algorithm that also continually learns and updates the representations, which may be transferred to the downstream tasks. To that end, we experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning and propose new analysis methods. Our experiments show that most state-of-the-art algorithms prioritize high stability and do not significantly change the learned representation, and sometimes even learn a representation of lower quality than a naive baseline. However, we observe that these algorithms can still achieve high test accuracy because they enable a model to learn a classifier that closely resembles an estimated linear classifier trained for linear probing. Furthermore, the base model learned in the first task, which involves single-task learning, exhibits varying levels of representation quality across different algorithms, and this variance impacts the final performance of CIL algorithms. Therefore, we suggest that the representation-level evaluation should be considered as an additional recipe for more diverse evaluation for CIL algorithms.

View on arXiv PDF

Similar