Language Identification of Devanagari Poems
This work is significant for researchers and developers working on text processing pipelines for Devanagari-based Indian languages, particularly in the context of poem analysis, by providing a method for automatic language identification.
This paper addresses the problem of identifying the language of poems written in 10 Devanagari-based Indian languages. The authors collated poem corpora, studied lexical similarities, and applied supervised machine learning and deep learning techniques for language identification.
Language Identification is a very important part of several text processing pipelines. Extensive research has been done in this field. This paper proposes a procedure for automatic language identification of poems for poem analysis task, consisting of 10 Devanagari based languages of India i.e. Angika, Awadhi, Braj, Bhojpuri, Chhattisgarhi, Garhwali, Haryanvi, Hindi, Magahi, and Maithili. We collated corpora of poems of varying length and studied the similarity of poems among the 10 languages at the lexical level. Finally, various language identification systems based on supervised machine learning and deep learning techniques are applied and evaluated.