A Method for Learning Large-Scale Computational Construction Grammars from Semantically Annotated Corpora
This scales usage-based constructionist approaches to language, providing a practical tool for studying English argument structure in broad-coverage corpora.
The authors developed a method to learn large-scale computational construction grammars from semantically annotated corpora, resulting in grammars with tens of thousands of constructions that support frame-semantic analysis of open-domain text.
We present a method for learning large-scale, broad-coverage construction grammars from corpora of language use. Starting from utterances annotated with constituency structure and semantic frames, the method facilitates the learning of human-interpretable computational construction grammars that capture the intricate relationship between syntactic structures and the semantic relations they express. The resulting grammars consist of networks of tens of thousands of constructions formalised within the Fluid Construction Grammar framework. Not only do these grammars support the frame-semantic analysis of open-domain text, they also house a trove of information about the syntactico-semantic usage patterns present in the data they were learnt from. The method and learnt grammars contribute to the scaling of usage-based, constructionist approaches to language, as they corroborate the scalability of a number of fundamental construction grammar conjectures while also providing a practical instrument for the constructionist study of English argument structure in broad-coverage corpora.