Predicting cross-linguistic adjective order with information gain
This research provides a new computational model for linguists and computational linguists to understand and predict cross-linguistic adjective ordering, offering an incremental advancement in linguistic theory.
This paper proposes a new quantitative model for predicting adjective order across 32 typologically-distinct languages. The model, based on maximizing information gain, successfully accounts for various adjective orderings, including AAN, NAA, and ANA sequences, without requiring additional mechanisms.
Languages vary in their placement of multiple adjectives before, after, or surrounding the noun, but they typically exhibit strong intra-language tendencies on the relative order of those adjectives (e.g., the preference for `big blue box' in English, `grande boîte bleue' in French, and `alsundūq al'azraq alkabīr' in Arabic). We advance a new quantitative account of adjective order across typologically-distinct languages based on maximizing information gain. Our model addresses the left-right asymmetry of French-type ANA sequences with the same approach as AAN and NAA orderings, without appeal to other mechanisms. We find that, across 32 languages, the preferred order of adjectives largely mirrors an efficient algorithm of maximizing information gain.