Genetic Programming for Evolving a Front of Interpretable Models for Data Visualisation
This addresses the need for interpretable visualizations in domains requiring understanding of original features, though it appears incremental as it builds on existing methods like t-SNE.
The authors tackled the problem of opaque visualizations in data mining by proposing GPtSNE, a genetic programming approach that evolves interpretable models for data visualization, achieving a variety of trade-offs between visual quality and model complexity in a single run.
Data visualisation is a key tool in data mining for understanding big datasets. Many visualisation methods have been proposed, including the well-regarded state-of-the-art method t-Distributed Stochastic Neighbour Embedding. However, the most powerful visualisation methods have a significant limitation: the manner in which they create their visualisation from the original features of the dataset is completely opaque. Many domains require an understanding of the data in terms of the original features; there is hence a need for powerful visualisation methods which use understandable models. In this work, we propose a genetic programming approach named GPtSNE for evolving interpretable mappings from a dataset to highquality visualisations. A multi-objective approach is designed that produces a variety of visualisations in a single run which give different trade-offs between visual quality and model complexity. Testing against baseline methods on a variety of datasets shows the clear potential of GP-tSNE to allow deeper insight into data than that provided by existing visualisation methods. We further highlight the benefits of a multi-objective approach through an in-depth analysis of a candidate front, which shows how multiple models can