A Comparison of Two Fluctuation Analyses for Natural Language Clustering Phenomena: Taylor and Ebeling & Neiman Methods
This work provides a comparative analysis of existing methods for text analysis, which is incremental for researchers in computational linguistics.
This paper compared Taylor and Ebeling & Neiman fluctuation analysis methods for natural language clustering, applying them to text to find that both can distinguish real text from i.i.d. sequences, with Taylor exponents roughly distinguishing text categories better than Ebeling & Neiman exponents.
This article considers the fluctuation analysis methods of Taylor and Ebeling & Neiman. While both have been applied to various phenomena in the statistical mechanics domain, their similarities and differences have not been clarified. After considering their analytical aspects, this article presents a large-scale application of these methods to text. It is found that both methods can distinguish real text from independently and identically distributed (i.i.d.) sequences. Furthermore, it is found that the Taylor exponents acquired from words can roughly distinguish text categories; this is also the case for Ebeling and Neiman exponents, but to a lesser extent. Additionally, both methods show some possibility of capturing script kinds.