NE LGMar 1, 2022

On genetic programming representations and fitness functions for interpretable dimensionality reduction

Thomas Uriot, Marco Virgolin, Tanja Alderliesten, Peter Bosman

arXiv:2203.00528v28.211 citationsh-index: 38Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for interpretable dimensionality reduction methods for data exploration, though it is incremental as it builds on existing genetic programming approaches.

The paper tackled the problem of interpretable dimensionality reduction by comparing and devising genetic programming methods to evolve symbolic expressions, finding that these methods can be competitive with state-of-the-art algorithms in terms of predictive accuracy and feature reconstruction.

Dimensionality reduction (DR) is an important technique for data exploration and knowledge discovery. However, most of the main DR methods are either linear (e.g., PCA), do not provide an explicit mapping between the original data and its lower-dimensional representation (e.g., MDS, t-SNE, isomap), or produce mappings that cannot be easily interpreted (e.g., kernel PCA, neural-based autoencoder). Recently, genetic programming (GP) has been used to evolve interpretable DR mappings in the form of symbolic expressions. There exists a number of ways in which GP can be used to this end and no study exists that performs a comparison. In this paper, we fill this gap by comparing existing GP methods as well as devising new ones. We evaluate our methods on several benchmark datasets based on predictive accuracy and on how well the original features can be reconstructed using the lower-dimensional representation only. Finally, we qualitatively assess the resulting expressions and their complexity. We find that various GP methods can be competitive with state-of-the-art DR algorithms and that they have the potential to produce interpretable DR mappings.

View on arXiv PDF Code

Similar