Improving Semantic Composition with Offset Inference
This addresses a specific bottleneck in NLP for researchers working on semantic models, but it appears incremental as it builds directly on APTs.
The paper tackles the sparsity problem in count-based distributional semantic models, especially for Anchored Packed Trees (APTs), by introducing a novel distributional inference method that leverages type structure to infer missing co-occurrences, resulting in improved semantic composition.
Count-based distributional semantic models suffer from sparsity due to unobserved but plausible co-occurrences in any text collection. This problem is amplified for models like Anchored Packed Trees (APTs), that take the grammatical type of a co-occurrence into account. We therefore introduce a novel form of distributional inference that exploits the rich type structure in APTs and infers missing data by the same mechanism that is used for semantic composition.