CL IROct 24, 2022

Investigating the detection of Tortured Phrases in Scientific Literature

Puthineath Lay, Martin Lentschat, Cyril Labbé

arXiv:2210.13024v130.9581 citationsh-index: 17

Originality Synthesis-oriented

AI Analysis

This addresses the issue of pseudo-scientific article generation for unscrupulous authors, though it appears incremental as it builds on prior work on tortured phrases.

The study tackled the problem of detecting 'tortured phrases'—nonsensical expressions generated by tools that paraphrase scientific texts—in scientific literature, achieving noticeable results through experiments with non-neural and neural binary classification and cosine similarity comparisons.

With the help of online tools, unscrupulous authors can today generate a pseudo-scientific article and attempt to publish it. Some of these tools work by replacing or paraphrasing existing texts to produce new content, but they have a tendency to generate nonsensical expressions. A recent study introduced the concept of 'tortured phrase', an unexpected odd phrase that appears instead of the fixed expression. E.g. counterfeit consciousness instead of artificial intelligence. The present study aims at investigating how tortured phrases, that are not yet listed, can be detected automatically. We conducted several experiments, including non-neural binary classification, neural binary classification and cosine similarity comparison of the phrase tokens, yielding noticeable results.

View on arXiv PDF

Similar