Time Masking for Temporal Language Models
This addresses the issue of adapting language models to dynamic content over time for applications in NLP, though it appears incremental as it builds on existing masking techniques.
The paper tackles the problem of static language models by proposing TempoBERT, a temporal contextual language model that uses time as additional context and time masking, resulting in benefits for semantic change detection and sentence time prediction across diverse datasets.
Our world is constantly evolving, and so is the content on the web. Consequently, our languages, often said to mirror the world, are dynamic in nature. However, most current contextual language models are static and cannot adapt to changes over time. In this work, we propose a temporal contextual language model called TempoBERT, which uses time as an additional context of texts. Our technique is based on modifying texts with temporal information and performing time masking - specific masking for the supplementary time information. We leverage our approach for the tasks of semantic change detection and sentence time prediction, experimenting on diverse datasets in terms of time, size, genre, and language. Our extensive evaluation shows that both tasks benefit from exploiting time masking.