ITCLJul 11, 2017

On the letter frequencies and entropy of written Marathi

arXiv:1707.08209v1
Originality Synthesis-oriented
AI Analysis

This work provides foundational linguistic data for Marathi, which could aid in areas like text compression or language processing, but it is incremental as it applies established methods to a new dataset.

The authors analyzed letter frequencies in contemporary written Marathi to identify statistically predominant letter sets and used these to estimate the entropy of the language.

We carry out a comprehensive analysis of letter frequencies in contemporary written Marathi. We determine sets of letters which statistically predominate any large generic Marathi text, and use these sets to estimate the entropy of Marathi.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes