Smotrom tvoja pa ander drogoj verden! Resurrecting Dead Pidgin with Generative Models: Russenorsk Case Study
This work addresses the linguistic analysis and reconstruction of a dead pidgin language for historical and academic purposes, representing an incremental application of existing methods to new data.
The paper tackled the analysis of the extinct Russenorsk pidgin language by constructing a structured dictionary from surviving sources and using large language models to generate hypotheses about its word formation and grammar, then developed a translation agent to reconstruct hypothetical Russenorsk texts from modern Russian and Norwegian.
Russenorsk, a pidgin language historically used in trade interactions between Russian and Norwegian speakers, represents a unique linguistic phenomenon. In this paper, we attempt to analyze its lexicon using modern large language models (LLMs), based on surviving literary sources. We construct a structured dictionary of the language, grouped by synonyms and word origins. Subsequently, we use this dictionary to formulate hypotheses about the core principles of word formation and grammatical structure in Russenorsk and show which hypotheses generated by large language models correspond to the hypotheses previously proposed ones in the academic literature. We also develop a "reconstruction" translation agent that generates hypothetical Russenorsk renderings of contemporary Russian and Norwegian texts.