CLMar 21, 2020

A Joint Approach to Compound Splitting and Idiomatic Compound Detection

arXiv:2003.09606v1996 citations
Originality Incremental advance
AI Analysis

This addresses a specific issue in natural language processing for German, improving applications like machine translation and information retrieval, but it is incremental as it builds on existing methods with a modest performance gain.

The paper tackles the problem of processing noun compounds in German by developing a joint deep learning approach for splitting compounds and detecting idiomatic ones, resulting in a neural splitter that outperforms the state of the art by about 5%.

Applications such as machine translation, speech recognition, and information retrieval require efficient handling of noun compounds as they are one of the possible sources for out-of-vocabulary (OOV) words. In-depth processing of noun compounds requires not only splitting them into smaller components (or even roots) but also the identification of instances that should remain unsplitted as they are of idiomatic nature. We develop a two-fold deep learning-based approach of noun compound splitting and idiomatic compound detection for the German language that we train using a newly collected corpus of annotated German compounds. Our neural noun compound splitter operates on a sub-word level and outperforms the current state of the art by about 5%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes