Complexity-entropy analysis at different levels of organization in written language
This work addresses the challenge of analyzing complexity in written language for researchers in linguistics and information theory, though it appears incremental as it applies existing entropic methods to text organization.
The paper tackled the problem of quantifying the balance between predictability and surprise in written language by applying entropic measures to assess innovation and context preservation at different organizational levels of text, demonstrating that this analysis can also be extended to other complex messages like DNA.
Written language is complex. A written text can be considered an attempt to convey a meaningful message which ends up being constrained by language rules, context dependence and highly redundant in its use of resources. Despite all these constraints, unpredictability is an essential element of natural language. Here we present the use of entropic measures to assert the balance between predictability and surprise in written text. In short, it is possible to measure innovation and context preservation in a document. It is shown that this can also be done at the different levels of organization of a text. The type of analysis presented is reasonably general, and can also be used to analyze the same balance in other complex messages such as DNA, where a hierarchy of organizational levels are known to exist.