LGMar 11, 2024

Constructing Variables Using Classifiers as an Aid to Regression: An Empirical Assessment

arXiv:2403.06829v22 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of enhancing regression accuracy for data scientists by providing a generic pre-processing tool, though it appears incremental as it builds on existing classification and regression techniques.

The paper tackles the problem of improving regression performance by automatically creating complementary variables through a pre-processing step that discretizes the target variable, trains classifiers to predict threshold-based categories, and concatenates their outputs to enrich the input vector. The method was tested with 5 regressors on 33 datasets, and experimental results confirmed its effectiveness.

This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes