CLJul 13, 2017

Is writing style predictive of scientific fraud?

arXiv:1707.04095v11088 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving fraud detection in scientific publications for researchers and publishers, but it is incremental as it refines existing approaches without major breakthroughs.

The study revisited prior research on detecting scientific fraud via machine learning, finding that the original method overestimated predictability and that simpler models could outperform it, but more abstract linguistic features yielded negative results.

The problem of detecting scientific fraud using machine learning was recently introduced, with initial, positive results from a model taking into account various general indicators. The results seem to suggest that writing style is predictive of scientific fraud. We revisit these initial experiments, and show that the leave-one-out testing procedure they used likely leads to a slight over-estimate of the predictability, but also that simple models can outperform their proposed model by some margin. We go on to explore more abstract linguistic features, such as linguistic complexity and discourse structure, only to obtain negative results. Upon analyzing our models, we do see some interesting patterns, though: Scientific fraud, for examples, contains less comparison, as well as different types of hedging and ways of presenting logical reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes