BiTimeBERT: Extending Pre-Trained Language Representations with Bi-Temporal Information
This work addresses the need for time-aware language representations in NLP and IR tasks, offering a novel method with substantial gains, though it is incremental in extending existing pre-training approaches.
The authors tackled the problem of incorporating temporal information into pre-trained language models to improve performance on time-related NLP tasks, achieving a 155% accuracy improvement over BERT on event time estimation.
Time is an important aspect of documents and is used in a range of NLP and IR tasks. In this work, we investigate methods for incorporating temporal information during pre-training to further improve the performance on time-related tasks. Compared with common pre-trained language models like BERT which utilize synchronic document collections (e.g., BookCorpus and Wikipedia) as the training corpora, we use long-span temporal news article collection for building word representations. We introduce BiTimeBERT, a novel language representation model trained on a temporal collection of news articles via two new pre-training tasks, which harnesses two distinct temporal signals to construct time-aware language representations. The experimental results show that BiTimeBERT consistently outperforms BERT and other existing pre-trained models with substantial gains on different downstream NLP tasks and applications for which time is of importance (e.g., the accuracy improvement over BERT is 155\% on the event time estimation task).