NEMay 25, 2018

Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach

Luiz Otavio Vilas Boas Oliveira, Joao Francisco Barreto da Silva Martins, Luis Fernando Miranda, Gisele Lobo Pappa

arXiv:1805.10365v16.09 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more effective testbeds in the Genetic Programming community, though it is incremental as it builds on existing meta-learning concepts.

The paper tackles the problem of evaluating symbolic regression benchmarks by proposing a meta-learning approach that correlates dataset meta-features with Genetic Programming errors, finding that current benchmarks are concentrated in a small region and that number of instances and output skewness are key factors.

The definition of a concise and effective testbed for Genetic Programming (GP) is a recurrent matter in the research community. This paper takes a new step in this direction, proposing a different approach to measure the quality of the symbolic regression benchmarks quantitatively. The proposed approach is based on meta-learning and uses a set of dataset meta-features---such as the number of examples or output skewness---to describe the datasets. Our idea is to correlate these meta-features with the errors obtained by a GP method. These meta-features define a space of benchmarks that should, ideally, have datasets (points) covering different regions of the space. An initial analysis of 63 datasets showed that current benchmarks are concentrated in a small region of this benchmark space. We also found out that number of instances and output skewness are the most relevant meta-features to GP output error. Both conclusions can help define which datasets should compose an effective testbed for symbolic regression methods.

View on arXiv PDF

Similar