CLAILGMay 5, 2024

Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models

arXiv:2405.02861v11 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work provides a comprehensive evaluation suite for improving language models' semantic phrase comprehension, though it is incremental as it builds on existing benchmarking efforts.

The authors introduced LexBench, a benchmark for evaluating language models on ten semantic phrase processing tasks, finding that large models generally outperform smaller ones and that strong models achieve human-level performance in some tasks.

We introduce LexBench, a comprehensive evaluation suite enabled to test language models (LMs) on ten semantic phrase processing tasks. Unlike prior studies, it is the first work to propose a framework from the comparative perspective to model the general semantic phrase (i.e., lexical collocation) and three fine-grained semantic phrases, including idiomatic expression, noun compound, and verbal construction. Thanks to \ourbenchmark, we assess the performance of 15 LMs across model architectures and parameter scales in classification, extraction, and interpretation tasks. Through the experiments, we first validate the scaling law and find that, as expected, large models excel better than the smaller ones in most tasks. Second, we investigate further through the scaling semantic relation categorization and find that few-shot LMs still lag behind vanilla fine-tuned models in the task. Third, through human evaluation, we find that the performance of strong models is comparable to the human level regarding semantic phrase processing. Our benchmarking findings can serve future research aiming to improve the generic capability of LMs on semantic phrase comprehension. Our source code and data are available at https://github.com/jacklanda/LexBench

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes