Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models
This library addresses the problem of overestimated results and incorrect interpretations in ML research, particularly for researchers in fields like neuroscience, by providing built-in guards against common pitfalls, though it is incremental as it builds on existing evaluation techniques.
The authors tackled the challenge of leakage-free evaluation and inspection of machine learning models by developing Julearn, an open-source Python library that simplifies the design and evaluation of complex ML pipelines, as demonstrated through three previously-published research examples.
The fast-paced development of machine learning (ML) methods coupled with its increasing adoption in research poses challenges for researchers without extensive training in ML. In neuroscience, for example, ML can help understand brain-behavior relationships, diagnose diseases, and develop biomarkers using various data sources like magnetic resonance imaging and electroencephalography. The primary objective of ML is to build models that can make accurate predictions on unseen data. Researchers aim to prove the existence of such generalizable models by evaluating performance using techniques such as cross-validation (CV), which uses systematic subsampling to estimate the generalization performance. Choosing a CV scheme and evaluating an ML pipeline can be challenging and, if used improperly, can lead to overestimated results and incorrect interpretations. We created julearn, an open-source Python library, that allow researchers to design and evaluate complex ML pipelines without encountering in common pitfalls. In this manuscript, we present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects that can be easily implemented using this novel library. Julearn aims to simplify the entry into the ML world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls. With its design, unique features and simple interface, it poses as a useful Python-based library for research projects.