Categorizing Comparative Sentences
This work addresses the need for extracting comparative sentences to support pro/con argumentation in search engines or debating technologies, but it is incremental as it applies existing methods to a new annotated dataset.
The paper tackled the problem of automatically identifying and categorizing comparative sentences, such as determining preferences like 'Python has better NLP libraries than MATLAB', by manually annotating 7,199 sentences and achieving an F1 score of 85% with a gradient boosting model based on pre-trained sentence embeddings.
We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e.g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB). To this end, we manually annotate 7,199 sentences for 217 distinct target item pairs from several domains (27% of the sentences contain an oriented comparison in the sense of "better" or "worse"). A gradient boosting model based on pre-trained sentence embeddings reaches an F1 score of 85% in our experimental evaluation. The model can be used to extract comparative sentences for pro/con argumentation in comparative / argument search engines or debating technologies.