CLLGJan 14, 2014

Learning Language from a Large (Unannotated) Corpus

arXiv:1401.3372v111 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of building language systems without manual annotation, which could benefit AI and NLP researchers, but it appears incremental as it builds on prior work like Link Grammar and statistical methods.

The paper tackles the problem of fully automated, unsupervised extraction of dependency grammars and syntax-to-semantic mappings from large text corpora, aiming to enable natural language comprehension and generation systems directly from unannotated data.

A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes