Multivariate Functional Linear Discriminant Analysis for the Classification of Short Time Series with Missing Data
This provides an interpretable classification tool for medical or psychological datasets with large proportions of missing data, representing an incremental advancement.
The authors tackled the problem of classifying short multivariate time series with missing data by developing a multivariate functional linear discriminant analysis (MUDRA) with an efficient ECM algorithm, showing improved predictive power over state-of-the-art methods, especially for missing data.
Functional linear discriminant analysis (FLDA) is a powerful tool that extends LDA-mediated multiclass classification and dimension reduction to univariate time-series functions. However, in the age of large multivariate and incomplete data, statistical dependencies between features must be estimated in a computationally tractable way, while also dealing with missing data. There is a need for a computationally tractable approach that considers the statistical dependencies between features and can handle missing values. We here develop a multivariate version of FLDA (MUDRA) to tackle this issue and describe an efficient expectation/conditional-maximization (ECM) algorithm to infer its parameters. We assess its predictive power on the "Articulary Word Recognition" data set and show its improvement over the state-of-the-art, especially in the case of missing data. MUDRA allows interpretable classification of data sets with large proportions of missing data, which will be particularly useful for medical or psychological data sets.