Does It Capture STEL? A Modular, Similarity-based Linguistic Style Evaluation Framework
This addresses the need for better style evaluation in natural language processing, though it is incremental as it builds on existing methods.
The authors tackled the problem of evaluating linguistic style measures by proposing STEL, a modular and content-controlled framework, and found that BERT-based methods outperform simpler approaches like 3-grams and LIWC.
Style is an integral part of natural language. However, evaluation methods for style measures are rare, often task-specific and usually do not control for content. We propose the modular, fine-grained and content-controlled similarity-based STyle EvaLuation framework (STEL) to test the performance of any model that can compare two sentences on style. We illustrate STEL with two general dimensions of style (formal/informal and simple/complex) as well as two specific characteristics of style (contrac'tion and numb3r substitution). We find that BERT-based methods outperform simple versions of commonly used style measures like 3-grams, punctuation frequency and LIWC-based approaches. We invite the addition of further tasks and task instances to STEL and hope to facilitate the improvement of style-sensitive measures.