Machine Learning using Stata/Python
This work provides a tool for researchers and practitioners in fields like economics or social sciences who use Stata, allowing them to apply machine learning methods without switching software, though it is incremental as it builds on existing Python libraries.
The authors developed two Stata modules, r_ml_stata and c_ml_stata, to integrate popular machine learning methods for regression and classification into Stata using Python's Scikit-learn API, enabling hyperparameter tuning via K-fold cross-validation with grid search.
We present two related Stata modules, r_ml_stata and c_ml_stata, for fitting popular Machine Learning (ML) methods both in regression and classification settings. Using the recent Stata/Python integration platform (sfi) of Stata 16, these commands provide hyper-parameters' optimal tuning via K-fold cross-validation using greed search. More specifically, they make use of the Python Scikit-learn API to carry out both cross-validation and outcome/label prediction.