Probing BERT for German Compound Semantics
This work addresses the problem of understanding semantic representation in German language models for linguists and NLP researchers, but it is incremental as it extends prior English-focused methods to German.
This paper investigated how well German BERT models encode semantic knowledge of noun compounds by evaluating them on predicting compositionality for 868 gold standard compounds, finding that compositionality information is most recoverable in early layers but results lag behind English benchmarks due to German's higher compounding productivity.
This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.