MLApr 24, 2023
Ordinal time series analysis with the R package otsfeaturesÁngel López Oriona, José Antonio Vilar Fernández
The 21st century has witnessed a growing interest in the analysis of time series data. Whereas most of the literature on the topic deals with real-valued time series, ordinal time series have typically received much less attention. However, the development of specific analytical tools for the latter objects has substantially increased in recent years. The R package otsfeatures attempts to provide a set of simple functions for analyzing ordinal time series. In particular, several commands allowing the extraction of well-known statistical features and the execution of inferential tasks are available for the user. The output of several functions can be employed to perform traditional machine learning tasks including clustering, classification or outlier detection. otsfeatures also incorporates two datasets of financial time series which were used in the literature for clustering purposes, as well as three interesting synthetic databases. The main properties of the package are described and its use is illustrated through several examples. Researchers from a broad variety of disciplines could benefit from the powerful tools provided by otsfeatures.
MLApr 30, 2023
Time series clustering based on prediction accuracy of global forecasting modelsÁngel López Oriona, Pablo Montero Manso, José Antonio Vilar Fernández
In this paper, a novel method to perform model-based clustering of time series is proposed. The procedure relies on two iterative steps: (i) K global forecasting models are fitted via pooling by considering the series pertaining to each cluster and (ii) each series is assigned to the group associated with the model producing the best forecasts according to a particular criterion. Unlike most techniques proposed in the literature, the method considers the predictive accuracy as the main element for constructing the clustering partition, which contains groups jointly minimizing the overall forecasting error. Thus, the approach leads to a new clustering paradigm where the quality of the clustering solution is measured in terms of its predictive capability. In addition, the procedure gives rise to an effective mechanism for selecting the number of clusters in a time series database and can be used in combination with any class of regression model. An extensive simulation study shows that our method outperforms several alternative techniques concerning both clustering effectiveness and predictive accuracy. The approach is also applied to perform clustering in several datasets used as standard benchmarks in the time series literature, obtaining great results.
MLApr 24, 2023
Analyzing categorical time series with the R package ctsfeaturesÁngel López Oriona, José Antonio Vilar Fernández
Time series data are ubiquitous nowadays. Whereas most of the literature on the topic deals with real-valued time series, categorical time series have received much less attention. However, the development of data mining techniques for this kind of data has substantially increased in recent years. The R package ctsfeatures offers users a set of useful tools for analyzing categorical time series. In particular, several functions allowing the extraction of well-known statistical features and the construction of illustrative graphs describing underlying temporal patterns are provided in the package. The output of some functions can be employed to perform traditional machine learning tasks including clustering, classification and outlier detection. The package also includes two datasets of biological sequences introduced in the literature for clustering purposes, as well as three interesting synthetic databases. In this work, the main characteristics of the package are described and its use is illustrated through various examples. Practitioners from a wide variety of fields could benefit from the valuable tools provided by ctsfeatures.