LGMay 29

Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings

Utsav Dutta, Gerardo Pastrana, Sina Khoshfetrat Pakazad, Henrik Ohlsson

arXiv:2605.3158033.9

Predicted impact top 67% in LG · last 90 daysOriginality Highly original

AI Analysis

This work addresses the challenge of creating general-purpose representations for heterogeneous multivariate time series, which is a significant problem for researchers and practitioners working with sensor data.

This paper tackles the problem of general-purpose representation learning for heterogeneous multivariate time series by introducing CHARM, a Transformer encoder that incorporates channel-level textual descriptions. The learned embeddings achieve strong performance across anomaly detection, classification, and short- and long-term forecasting using only a linear probe.

Transformer-based architectures have advanced sequence modeling in language and vision, yet general-purpose representation learning for heterogeneous multivariate time series remains underexplored. We introduce CHARM (Channel-Aware Representation Model), which incorporates channel-level textual descriptions into a Transformer encoder equivariant to channel order. CHARM is trained with a Joint Embedding Predictive Architecture (JEPA) and a novel loss promoting informative, temporally stable embeddings; latent-space prediction encourages robustness to sensor noise while description-aware gating provides interpretability through learned inter-channel relationships. Across anomaly detection, classification, and short- and long-term forecasting, the learned embeddings achieve strong performance using only a linear probe. Performance is driven primarily by the JEPA objective and conditioning architecture, with text descriptions serving as channel identifiers for cross-dataset generalization.

View on arXiv PDF

Similar