One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification
This addresses overfitting in time series classification for data mining practitioners, but it is incremental as it builds on prior bio-inspired optimization methods.
The paper tackles the problem of optimizing breakpoint locations and segment weights in symbolic aggregate approximation for time series classification, finding that overfitting can obscure the performance of optimization algorithms, with results showing classification accuracy improvements of up to 5% on certain datasets.
For the last few decades, optimization has been developing at a fast rate. Bio-inspired optimization algorithms are metaheuristics inspired by nature. These algorithms have been applied to solve different problems in engineering, economics, and other domains. Bio-inspired algorithms have also been applied in different branches of information technology such as networking and software engineering. Time series data mining is a field of information technology that has its share of these applications too. In previous works we showed how bio-inspired algorithms such as the genetic algorithms and differential evolution can be used to find the locations of the breakpoints used in the symbolic aggregate approximation of time series representation, and in another work we showed how we can utilize the particle swarm optimization, one of the famous bio-inspired algorithms, to set weights to the different segments in the symbolic aggregate approximation representation. In this paper we present, in two different approaches, a new meta optimization process that produces optimal locations of the breakpoints in addition to optimal weights of the segments. The experiments of time series classification task that we conducted show an interesting example of how the overfitting phenomenon, a frequently encountered problem in data mining which happens when the model overfits the training set, can interfere in the optimization process and hide the superior performance of an optimization algorithm.