CLAIMay 11, 2022

Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

arXiv:2205.05435v628 citationsh-index: 43
Originality Synthesis-oriented
AI Analysis

This addresses the practical issue of model obsolescence for users relying on long-term text classification, but it is incremental as it focuses on evaluation and analysis rather than proposing a new solution.

The paper tackles the problem of text classification models losing performance over time due to data changes, by establishing an evaluation setup and assessing the temporal persistence of various models and dataset characteristics across three longitudinal datasets spanning 6 to 19 years.

Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task. We look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. By splitting the longitudinal datasets into years, we perform a comprehensive set of experiments by training and testing across data that are different numbers of years apart from each other, both in the past and in the future. This enables a gradual investigation into the impact of the temporal gap between training and test sets on the classification performance, as well as measuring the extent of the persistence over time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes