Discriminating Between Similar Nordic Languages
This work aims to improve language identification accuracy for users and applications dealing with closely related Nordic languages, where existing tools struggle.
This paper addresses the challenge of discriminating between six closely related Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese, and Icelandic. It proposes a machine learning approach to improve automatic language identification for these languages, which are often miscategorized by current state-of-the-art tools.
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.