LGJan 26, 2017

An Empirical Analysis of Feature Engineering for Predictive Modeling

arXiv:1701.07852v2249 citations
Originality Synthesis-oriented
AI Analysis

This addresses the manual and time-consuming task of feature engineering for practitioners, but it is incremental as it builds on existing empirical analysis without introducing new methods.

This paper tackles the problem of determining which types of engineered features are best suited for different machine learning models, finding that models perform differently with various feature types and can sometimes synthesize needed features on their own.

Machine learning models, such as neural networks, decision trees, random forests, and gradient boosting machines, accept a feature vector, and provide a prediction. These models learn in a supervised fashion where we provide feature vectors mapped to the expected output. It is common practice to engineer new features from the provided feature set. Such engineered features will either augment or replace portions of the existing feature vector. These engineered features are essentially calculated fields based on the values of the other features. Engineering such features is primarily a manual, time-consuming task. Additionally, each type of model will respond differently to different kinds of engineered features. This paper reports empirical research to demonstrate what kinds of engineered features are best suited to various machine learning model types. We provide this recommendation by generating several datasets that we designed to benefit from a particular type of engineered feature. The experiment demonstrates to what degree the machine learning model can synthesize the needed feature on its own. If a model can synthesize a planned feature, it is not necessary to provide that feature. The research demonstrated that the studied models do indeed perform differently with various types of engineered features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes