LG AT MLApr 5, 2019

A topological data analysis based classification method for multiple measurements

Henri Riihimäki, Wojciech Chachólski, Jakob Theorell, Jan Hillert, Ryan Ramanujam

arXiv:1904.02971v129 citations

Originality Incremental advance

AI Analysis

This provides an accurate classifier and feature selection tool for repeated measurement data in biological sciences, though it appears incremental as it applies a known method (TDA) to a specific data type.

The paper tackles the problem of classifying repeated measurements, which is limited in machine learning, by introducing a topological data analysis (TDA) based classifier that samples data and builds network graphs. The result shows improved accuracy, reaching up to 90% for tree species and 96.8% for point processes, outperforming alternative models.

Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values. For 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model. This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.

View on arXiv PDF

Similar