Unified Representation for Non-compositional and Compositional Expressions
This work addresses the challenge of processing potentially idiomatic expressions in natural language understanding, representing an incremental improvement over existing methods.
The paper tackled the problem of representing non-compositional language expressions by proposing PIER, a model based on BART, which achieved a 33% higher homogeneity score for embedding clustering and gains of 3.12% and 3.29% in accuracy for PIE sense classification and span detection compared to state-of-the-art models, while maintaining performance on NLU tasks.
Accurate processing of non-compositional language relies on generating good representations for such expressions. In this work, we study the representation of language non-compositionality by proposing a language model, PIER, that builds on BART and can create semantically meaningful and contextually appropriate representations for English potentially idiomatic expressions (PIEs). PIEs are characterized by their non-compositionality and contextual ambiguity in their literal and idiomatic interpretations. Via intrinsic evaluation on embedding quality and extrinsic evaluation on PIE processing and NLU tasks, we show that representations generated by PIER result in 33% higher homogeneity score for embedding clustering than BART, whereas 3.12% and 3.29% gains in accuracy and sequence accuracy for PIE sense classification and span detection compared to the state-of-the-art IE representation model, GIEA. These gains are achieved without sacrificing PIER's performance on NLU tasks (+/- 1% accuracy) compared to BART.