Source codes in human communication
This work addresses a foundational issue in linguistics for researchers studying probabilistic models of language, but it is incremental as it builds on existing critiques without introducing new empirical results.
The paper tackles the problem of applying information theory to human communication by addressing the mismatch between predefined codes in communication systems and the incomplete access to linguistic codes in natural languages, concluding that distributional properties of languages offer a different perspective on human communication.
Although information theoretic characterizations of human communication have become increasingly popular in linguistics, to date they have largely involved grafting probabilistic constructs onto older ideas about grammar. Similarities between human and digital communication have been strongly emphasized, and differences largely ignored. However, some of these differences matter: communication systems are based on predefined codes shared by every sender-receiver, whereas the distributions of words in natural languages guarantee that no speaker-hearer ever has access to an entire linguistic code, which seemingly undermines the idea that natural languages are probabilistic systems in any meaningful sense. This paper describes how the distributional properties of languages meet the various challenges arising from the differences between information systems and natural languages, along with the very different view of human communication these properties suggest.