NEMay 25, 2018

Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach

arXiv:1805.10365v19 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more effective testbeds in the Genetic Programming community, though it is incremental as it builds on existing meta-learning concepts.

The paper tackles the problem of evaluating symbolic regression benchmarks by proposing a meta-learning approach that correlates dataset meta-features with Genetic Programming errors, finding that current benchmarks are concentrated in a small region and that number of instances and output skewness are key factors.

The definition of a concise and effective testbed for Genetic Programming (GP) is a recurrent matter in the research community. This paper takes a new step in this direction, proposing a different approach to measure the quality of the symbolic regression benchmarks quantitatively. The proposed approach is based on meta-learning and uses a set of dataset meta-features---such as the number of examples or output skewness---to describe the datasets. Our idea is to correlate these meta-features with the errors obtained by a GP method. These meta-features define a space of benchmarks that should, ideally, have datasets (points) covering different regions of the space. An initial analysis of 63 datasets showed that current benchmarks are concentrated in a small region of this benchmark space. We also found out that number of instances and output skewness are the most relevant meta-features to GP output error. Both conclusions can help define which datasets should compose an effective testbed for symbolic regression methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes