CLITAOSOC-PHNov 20, 2013

Complexity measurement of natural and artificial languages

arXiv:1311.5427v229 citations
Originality Synthesis-oriented
AI Analysis

This provides a method for differentiating artificial from natural languages through complexity analysis, which could aid in fields like linguistics or software analysis, though it appears incremental in applying existing entropy concepts to this comparison.

The authors compared entropy between natural languages (English, Spanish) and artificial languages (computer code), finding that code has higher entropy than natural language texts of similar length, and Spanish shows more symbolic diversity than English. They developed expressions to measure entropy, emergence, self-organization, and complexity based on diversity and message length.

We compared entropy for texts written in natural languages (English, Spanish) and artificial languages (computer software) based on a simple expression for the entropy as a function of message length and specific word diversity. Code text written in artificial languages showed higher entropy than text of similar length expressed in natural languages. Spanish texts exhibit more symbolic diversity than English ones. Results showed that algorithms based on complexity measures differentiate artificial from natural languages, and that text analysis based on complexity measures allows the unveiling of important aspects of their nature. We propose specific expressions to examine entropy related aspects of tests and estimate the values of entropy, emergence, self-organization and complexity based on specific diversity and message length.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes