Combining Representation Learning with Logic for Language Processing
This addresses a bottleneck for NLP researchers and practitioners by potentially making models more data-efficient, though it appears incremental as it builds on existing representation learning methods.
The paper tackles the problem of reducing annotated training data needs and improving generalization in natural language processing and knowledge base completion by combining representation learning with logic, aiming to leverage human annotations often specified in formal logic.
The current state-of-the-art in many natural language processing and automated knowledge base completion tasks is held by representation learning methods which learn distributed vector representations of symbols via gradient-based optimization. They require little or no hand-crafted features, thus avoiding the need for most preprocessing steps and task-specific assumptions. However, in many cases representation learning requires a large amount of annotated training data to generalize well to unseen data. Such labeled training data is provided by human annotators who often use formal logic as the language for specifying annotations. This thesis investigates different combinations of representation learning methods with logic for reducing the need for annotated training data, and for improving generalization.