NE LGJan 5, 2020

Multi-Objective Genetic Programming for Manifold Learning: Balancing Quality and Dimensionality

arXiv:2001.01331v117 citations

AI Analysis

This work addresses the need for transparent and interpretable manifold learning in exploratory data analysis, offering an incremental improvement over previous methods by automating dimensionality selection.

The paper tackles the problem of manifold learning algorithms being opaque and requiring prior knowledge of embedding dimensionality by introducing a multi-objective genetic programming approach that automatically balances manifold quality and dimensionality, resulting in competitive performance with baseline and state-of-the-art methods while providing interpretable solutions.

Manifold learning techniques have become increasingly valuable as data continues to grow in size. By discovering a lower-dimensional representation (embedding) of the structure of a dataset, manifold learning algorithms can substantially reduce the dimensionality of a dataset while preserving as much information as possible. However, state-of-the-art manifold learning algorithms are opaque in how they perform this transformation. Understanding the way in which the embedding relates to the original high-dimensional space is critical in exploratory data analysis. We previously proposed a Genetic Programming method that performed manifold learning by evolving mappings that are transparent and interpretable. This method required the dimensionality of the embedding to be known a priori, which makes it hard to use when little is known about a dataset. In this paper, we substantially extend our previous work, by introducing a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality. Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods, while also providing a range (front) of solutions that give different trade-offs between quality and dimensionality. Furthermore, the learned models are shown to often be simple and efficient, utilising only a small number of features in an interpretable manner.

View on arXiv PDF

Similar