LGAIFeb 27, 2023

A Self-Supervised Learning-based Approach to Clustering Multivariate Time-Series Data with Missing Values (SLAC-Time): An Application to TBI Phenotyping

arXiv:2302.13457v221 citationsh-index: 46
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem for healthcare researchers and clinicians dealing with TBI patient data, offering an incremental improvement over existing clustering methods by handling missing values without imputation.

The paper tackles the problem of clustering multivariate time-series data with missing values, which often requires imputation leading to computational inefficiency and noise, by proposing SLAC-Time, a Transformer-based self-supervised method that uses time-series forecasting as a proxy task and jointly learns representations and cluster assignments; it outperforms baseline K-means on metrics like silhouette coefficient and identifies three distinct TBI phenotypes with clinical significance.

Self-supervised learning approaches provide a promising direction for clustering multivariate time-series data. However, real-world time-series data often include missing values, and the existing approaches require imputing missing values before clustering, which may cause extensive computations and noise and result in invalid interpretations. To address these challenges, we present a Self-supervised Learning-based Approach to Clustering multivariate Time-series data with missing values (SLAC-Time). SLAC-Time is a Transformer-based clustering method that uses time-series forecasting as a proxy task for leveraging unlabeled data and learning more robust time-series representations. This method jointly learns the neural network parameters and the cluster assignments of the learned representations. It iteratively clusters the learned representations with the K-means method and then utilizes the subsequent cluster assignments as pseudo-labels to update the model parameters. To evaluate our proposed approach, we applied it to clustering and phenotyping Traumatic Brain Injury (TBI) patients in the Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) study. Our experiments demonstrate that SLAC-Time outperforms the baseline K-means clustering algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn index, and Davies Bouldin index. We identified three TBI phenotypes that are distinct from one another in terms of clinically significant variables as well as clinical outcomes, including the Extended Glasgow Outcome Scale (GOSE) score, Intensive Care Unit (ICU) length of stay, and mortality rate. The experiments show that the TBI phenotypes identified by SLAC-Time can be potentially used for developing targeted clinical trials and therapeutic strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes