Modeling the Complexity and Descriptive Adequacy of Construction Grammars
This work addresses the challenge of grammar evaluation for linguists and computational linguists, offering an incremental improvement in modeling techniques.
The paper tackled the problem of evaluating construction grammars by modeling their complexity and descriptive adequacy using Minimum Description Length, showing that more complex grammars with multiple representation levels provide greater generalizations across five languages.
This paper uses the Minimum Description Length paradigm to model the complexity of CxGs (operationalized as the encoding size of a grammar) alongside their descriptive adequacy (operationalized as the encoding size of a corpus given a grammar). These two quantities are combined to measure the quality of potential CxGs against unannotated corpora, supporting discovery-device CxGs for English, Spanish, French, German, and Italian. The results show (i) that these grammars provide significant generalizations as measured using compression and (ii) that more complex CxGs with access to multiple levels of representation provide greater generalizations than single-representation CxGs.