CLDLDec 18, 2019

Towards an automatic recognition of mixed languages: The Ukrainian-Russian hybrid language Surzhyk

arXiv:1912.08582v13 citations
Originality Synthesis-oriented
AI Analysis

This addresses the computational linguistics challenge of recognizing mixed languages, but it is an incremental step as it focuses on a specific case without broad SOTA impact.

The paper tackles the problem of automatically identifying Surzhyk, a Ukrainian-Russian hybrid language, by developing example-based rules using R programming, and tests the code's effectiveness.

Language interference is common in today's multilingual societies where more languages are being in contact and as a global final result leads to the creation of hybrid languages. These, together with doubts on their right to be officially recognised made emerge in the area of computational linguistics the problem of their automatic identification and further elaboration. In this paper, we propose a first attempt to identify the elements of a Ukrainian-Russian hybrid language, Surzhyk, through the adoption of the example-based rules created with the instruments of programming language R. Our example-based study consists of: 1) analysis of spoken samples of Surzhyk registered by Del Gaudio (2010) in Kyiv area and creation of the written corpus; 2) production of specific rules on the identification of Surzhyk patterns and their implementation; 3) testing the code and analysing the effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes