False perfection in machine prediction: Detecting and assessing circularity problems in machine learning
This work tackles a critical methodological flaw in ML evaluation that affects researchers and practitioners by ensuring more reliable and interpretable results, though it appears to be an incremental contribution based on existing book content.
The paper addresses the problem of circularity in machine learning predictions, which can lead to misleadingly high performance metrics, and provides methods for detecting and assessing such issues to improve model validity.
This paper is an excerpt of an early version of Chapter 2 of the book "Validity, Reliability, and Significance. Empirical Methods for NLP and Data Science", by Stefan Riezler and Michael Hagmann, published in December 2021 by Morgan & Claypool. Please see the book's homepage at https://www.morganclaypoolpublishers.com/catalog_Orig/product_info.php?products_id=1688 for a more recent and comprehensive discussion.