LGNov 10, 2023

Symbolic Regression as Feature Engineering Method for Machine and Deep Learning Regression Tasks

arXiv:2311.06028v124 citationsh-index: 17
Originality Synthesis-oriented
AI Analysis

This addresses the need for interpretable and effective feature engineering in machine and deep learning regression, particularly for physics-related applications, though it is incremental by applying an existing method (symbolic regression) to a new context.

The study tackled the problem of feature engineering in regression tasks by integrating symbolic regression as a pre-processing step, resulting in RMSE improvements of 34-86% on synthetic datasets and 4-11.5% on real-world datasets, with a specific case showing over 20% improvement in predicting superconducting critical temperatures.

In the realm of machine and deep learning regression tasks, the role of effective feature engineering (FE) is pivotal in enhancing model performance. Traditional approaches of FE often rely on domain expertise to manually design features for machine learning models. In the context of deep learning models, the FE is embedded in the neural network's architecture, making it hard for interpretation. In this study, we propose to integrate symbolic regression (SR) as an FE process before a machine learning model to improve its performance. We show, through extensive experimentation on synthetic and real-world physics-related datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and deep learning regression models with 34-86% root mean square error (RMSE) improvement in synthetic datasets and 4-11.5% improvement in real-world datasets. In addition, as a realistic use-case, we show the proposed method improves the machine learning performance in predicting superconducting critical temperatures based on Eliashberg theory by more than 20% in terms of RMSE. These results outline the potential of SR as an FE component in data-driven models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes