CLJul 1, 2025

Matching and Linking Entries in Historical Swedish Encyclopedias

arXiv:2507.01170v111 citationsh-index: 1Has CodeProceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for historians and digital humanities researchers, providing incremental insights into intellectual changes in Sweden through encyclopedia analysis.

The paper tackled the problem of analyzing historical Swedish encyclopedias by matching and linking entries between editions, revealing a small but significant shift in geographic focus away from Europe towards other regions from the first to second edition, confirming influences like the First World War.

The \textit{Nordisk familjebok} is a Swedish encyclopedia from the 19th and 20th centuries. It was written by a team of experts and aimed to be an intellectual reference, stressing precision and accuracy. This encyclopedia had four main editions remarkable by their size, ranging from 20 to 38 volumes. As a consequence, the \textit{Nordisk familjebok} had a considerable influence in universities, schools, the media, and society overall. As new editions were released, the selection of entries and their content evolved, reflecting intellectual changes in Sweden. In this paper, we used digitized versions from \textit{Project Runeberg}. We first resegmented the raw text into entries and matched pairs of entries between the first and second editions using semantic sentence embeddings. We then extracted the geographical entries from both editions using a transformer-based classifier and linked them to Wikidata. This enabled us to identify geographic trends and possible shifts between the first and second editions, written between 1876-1899 and 1904-1926, respectively. Interpreting the results, we observe a small but significant shift in geographic focus away from Europe and towards North America, Africa, Asia, Australia, and northern Scandinavia from the first to the second edition, confirming the influence of the First World War and the rise of new powers. The code and data are available on GitHub at https://github.com/sibbo/nordisk-familjebok.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes