The autofeat Python Library for Automated Feature Engineering and Selection
This tool addresses the need for transparent and efficient predictive models in business contexts where interpretability is crucial, though it is incremental by building on existing linear methods.
The paper introduces the autofeat Python library, which automates feature engineering and selection to enhance linear models' prediction accuracy while maintaining interpretability, achieving improved results as demonstrated in benchmarks.
This paper describes the autofeat Python library, which provides scikit-learn style linear regression and classification models with automated feature engineering and selection capabilities. Complex non-linear machine learning models, such as neural networks, are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. Our library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.