An Improved Historical Embedding without Alignment
This addresses the computational inefficiency and contextual variability in aligning word embeddings for historical language analysis, offering a more scalable solution for researchers in computational linguistics and cultural evolution.
The paper tackles the problem of detecting semantic change in words over time by proposing a scalable method that encodes words from different periods into a single vector space, eliminating the need for alignment. It outperforms three other methods on the Google Books N-gram dataset in correctly identifying words with meaning changes.
Many words have evolved in meaning as a result of cultural and social change. Understanding such changes is crucial for modelling language and cultural evolution. Low-dimensional embedding methods have shown promise in detecting words' meaning change by encoding them into dense vectors. However, when exploring semantic change of words over time, these methods require the alignment of word embeddings across different time periods. This process is computationally expensive, prohibitively time consuming and suffering from contextual variability. In this paper, we propose a new and scalable method for encoding words from different time periods into one dense vector space. This can greatly improve performance when it comes to identifying words that have changed in meaning over time. We evaluated our method on dataset from Google Books N-gram. Our method outperformed three other popular methods in terms of the number of words correctly identified to have changed in meaning. Additionally, we provide an intuitive visualization of the semantic evolution of some words extracted by our method