Emergence of Syntax Needs Minimal Supervision
This addresses the theoretical debate on syntax learnability for linguistics and AI, but it is a theoretical contribution without empirical validation.
The paper tackles the problem of learning syntax from a corpus without explicit supervision by defining grammaticality and meaning information, and shows that syntax-based lexical categories can be found through a simple optimization process.
This paper is a theoretical contribution to the debate on the learnability of syntax from a corpus without explicit syntax-specific guidance. Our approach originates in the observable structure of a corpus, which we use to define and isolate grammaticality (syntactic information) and meaning/pragmatics information. We describe the formal characteristics of an autonomous syntax and show that it becomes possible to search for syntax-based lexical categories with a simple optimization process, without any prior hypothesis on the form of the model.