(Chat)GPT v BERT: Dawn of Justice for Semantic Change Detection
This work addresses the temporal semantic change problem for NLP researchers, but it is incremental as it compares existing models on new tasks.
The paper tackled the problem of semantic change detection by evaluating ChatGPT and BERT on diachronic Word-in-Context tasks, finding that ChatGPT performs significantly worse than GPT 3.5 and BERT, with particularly poor results in short-term change detection.
In the universe of Natural Language Processing, Transformer-based language models like BERT and (Chat)GPT have emerged as lexical superheroes with great power to solve open research problems. In this paper, we specifically focus on the temporal problem of semantic change, and evaluate their ability to solve two diachronic extensions of the Word-in-Context (WiC) task: TempoWiC and HistoWiC. In particular, we investigate the potential of a novel, off-the-shelf technology like ChatGPT (and GPT) 3.5 compared to BERT, which represents a family of models that currently stand as the state-of-the-art for modeling semantic change. Our experiments represent the first attempt to assess the use of (Chat)GPT for studying semantic change. Our results indicate that ChatGPT performs significantly worse than the foundational GPT version. Furthermore, our results demonstrate that (Chat)GPT achieves slightly lower performance than BERT in detecting long-term changes but performs significantly worse in detecting short-term changes.