Non-Parametric Temporal Adaptation for Social Media Topic Classification
This solves the issue of costly model updates for social media platforms dealing with changing trends and privacy laws, though it is incremental as it builds on dense retrieval techniques.
The paper tackles the problem of temporal adaptation in NLP models for social media topic classification, addressing distribution shift and deleted data without re-training, and achieves a 64.12% improvement over baselines on a Twitter dataset.
User-generated social media data is constantly changing as new trends influence online discussion and personal information is deleted due to privacy concerns. However, most current NLP models are static and rely on fixed training data, which means they are unable to adapt to temporal change -- both test distribution shift and deleted training data -- without frequent, costly re-training. In this paper, we study temporal adaptation through the task of longitudinal hashtag prediction and propose a non-parametric dense retrieval technique, which does not require re-training, as a simple but effective solution. In experiments on a newly collected, publicly available, year-long Twitter dataset exhibiting temporal distribution shift, our method improves by 64.12% over the best parametric baseline without any of its costly gradient-based updating. Our dense retrieval approach is also particularly well-suited to dynamically deleted user data in line with data privacy laws, with negligible computational cost and performance loss.