CLJul 26, 2017

Fast calculation of entropy with Zhang's estimator

arXiv:1707.08290v12 citations
Originality Synthesis-oriented
AI Analysis

This work provides a more efficient method for calculating entropy in linguistic and text analysis, though it appears incremental as it builds on an existing estimator.

The authors tackled the problem of efficiently estimating entropy in text data by developing a fast algorithm using Zhang's estimator, which leverages the smaller number of distinct frequencies compared to types, and validated it with statistical analysis on texts from over 1000 languages.

Entropy is a fundamental property of a repertoire. Here, we present an efficient algorithm to estimate the entropy of types with the help of Zhang's estimator. The algorithm takes advantage of the fact that the number of different frequencies in a text is in general much smaller than the number of types. We justify the convenience of the algorithm by means of an analysis of the statistical properties of texts from more than 1000 languages. Our work opens up various possibilities for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes