LG MLSep 12, 2019

A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency

Anthony Bagnall, Franz Király, Markus Löning, Matthew Middlehurst, George Oastler

arXiv:1909.05738v35.46 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for reliable and efficient time series classification toolkits for researchers and practitioners, but it is incremental as it focuses on benchmarking existing implementations.

The researchers benchmarked six time series classification algorithms implemented in the sktime toolkit against their equivalents in the tsml package, finding significant accuracy differences only for Proximity Forest and wide efficiency variations.

sktime is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia (UEA), University College London and the Alan Turing Institute. A key initial goal for sktime was to provide time series classification functionality equivalent to that available in a related java package, tsml, also developed at UEA. We describe the implementation of six such classifiers in sktime and compare them to their tsml equivalents. We demonstrate correctness through equivalence of accuracy on a range of standard test problems and compare the build time of the different implementations. We find that there is significant difference in accuracy on only one of the six algorithms we look at (Proximity Forest). This difference is causing us some pain in debugging. We found a much wider range of difference in efficiency. Again, this was not unexpected, but it does highlight ways both toolkits could be improved.

View on arXiv PDF

Similar