Monte Carlo simulation studies on Python using the sstudy package with SQL databases as storage
This tool addresses the need for efficient performance assessment of machine learning estimators through simulation studies, but it is incremental as it builds on existing methods with a new storage implementation.
The authors introduced sstudy, a Python package that simplifies the preparation of simulation studies by using SQL databases for storage, focusing on basic features, usage examples, and documentation references.
Performance assessment is a key issue in the process of proposing new machine learning/statistical estimators. A possible method to complete such task is by using simulation studies, which can be defined as the procedure of estimating and comparing properties (such as predictive power) of estimators (and other statistics) by averaging over many replications given a true distribution; i.e.: generating a dataset, fitting the estimator, calculating and storing the predictive power, and then repeating the procedure many times and finally averaging over the stored predictive powers. Given that, in this paper, we present sstudy: a Python package designed to simplify the preparation of simulation studies using SQL database engines as the storage system; more specifically, we present its basic features, usage examples and references to the its documentation. We also present a short statistical description of the simulation study procedure with a simplified explanation of what is being estimated by it, as well as some examples of applications.