Part & Whole Extraction: Towards A Deep Understanding of Quantitative Facts for Percentages in Text
This work addresses a domain-specific problem for natural language processing applications like automated infographic generation, but it is incremental as it builds on existing sequence tagging methods.
The paper tackles the problem of extracting quantitative facts (part and whole) for percentages in text, such as identifying 'like watching football' as part and 'Americans' as whole from '30 percent of Americans like watching football', and achieves improved performance by introducing a skip mechanism in sequence tagging.
We study the problem of quantitative facts extraction for text with percentages. For example, given the sentence "30 percent of Americans like watching football, while 20% prefer to watch NBA.", our goal is to obtain a deep understanding of the percentage numbers ("30 percent" and "20%") by extracting their quantitative facts: part ("like watching football" and "prefer to watch NBA") and whole ("Americans). These quantitative facts can empower new applications like automated infographic generation. We formulate part and whole extraction as a sequence tagging problem. Due to the large gap between part/whole and its corresponding percentage, we introduce skip mechanism in sequence modeling, and achieved improved performance on both our task and the CoNLL-2003 named entity recognition task. Experimental results demonstrate that learning to skip in sequence tagging is promising.