Language Model Training Paradigms for Clinical Feature Embeddings
This work addresses the challenge of scarce data in clinical research by improving feature embeddings for time series, though it appears incremental as it builds on existing language model paradigms.
The paper tackled the problem of representation learning for clinical time series by developing universal embeddings for clinical features using self-supervised language model training, achieving finer granularity than existing methods and demonstrating effectiveness on the MIMIC-III benchmark.
In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.