Where's My Head? Definition, Dataset and Models for Numeric Fused-Heads Identification and Resolution
This addresses a challenging problem in natural language understanding for computational linguistics, but it is incremental as it focuses on a specific linguistic phenomenon.
The paper tackled the computational treatment of numeric fused-heads (NFH) constructions, where head nouns are missing in noun phrases, by creating a dataset for identification and resolution and developing a neural baseline, achieving highly accurate identification and providing a 10k-example dataset for resolution.
We provide the first computational treatment of fused-heads constructions (FH), focusing on the numeric fused-heads (NFH). FHs constructions are noun phrases (NPs) in which the head noun is missing and is said to be `fused' with its dependent modifier. This missing information is implicit and is important for sentence understanding. The missing references are easily filled in by humans but pose a challenge for computational models. We formulate the handling of FH as a two stages process: identification of the FH construction and resolution of the missing head. We explore the NFH phenomena in large corpora of English text and create (1) a dataset and a highly accurate method for NFH identification; (2) a 10k examples (1M tokens) crowd-sourced dataset of NFH resolution; and (3) a neural baseline for the NFH resolution task. We release our code and dataset, in hope to foster further research into this challenging problem.