SP AIMay 13

Compact Latent Manifold Translation: A Parameter-Efficient Foundation Model for Cross-Modal and Cross-Frequency Physiological Signal Synthesis

Bo Cui, Xiaowen Song, Yaowen Zhang, Shunzhe Zhang, B. J. F. van Beijnum, Monique Tabak, Ying Wang

arXiv:2605.1324894.9

AI Analysis

This work enables high-fidelity, edge-deployable medical foundation models for bridging modality and frequency gaps in physiological time series, addressing a critical bottleneck in heterogeneous healthcare devices.

CLMT introduces a parameter-efficient (0.09B) two-stage discrete translation paradigm for cross-modal and cross-frequency physiological signal synthesis, achieving a clinical R-peak detection F1-score of 0.83 (from 0.37) in PPG-to-ECG and a Pearson correlation of 0.9956 in 25Hz-to-100Hz super-resolution.

The analysis of physiological time series, such as electrocardiograms (ECG) and photoplethysmograms (PPG), is persistently hindered by modality and frequency gaps stemming from heterogeneous recording devices. Existing foundation models typically rely on continuous latent spaces, which frequently suffer from severe modality entanglement, lack high-fidelity cross-frequency generative capacity, and impose high computational costs that prohibit edge-device deployment. In this paper, we propose Compact Latent Manifold Translation (CLMT), a highly parameter-efficient (0.09B) unified framework that bridges these gaps through a novel two-stage discrete translation paradigm. First, we introduce a Universal Tokenizer utilizing Hierarchical Residual Vector Quantization (RVQ) to decouple heterogeneous signals into isolated, well-structured discrete latent manifolds, effectively preventing inter-modality interference. Second, a Context-Prompted Latent Translator maps these discrete tokens across modalities by integrating static physiological priors, reframing complex signal synthesis as a pure latent sequence translation task. Extensive evaluations demonstrate that our 0.09B model significantly outperforms massive baselines. In cross-modal PPG-to-ECG synthesis, it resolves temporal phase drift and dramatically improves the clinical R-peak detection F1-score from 0.37 (baseline) to 0.83. Furthermore, in extreme cross-frequency super-resolution (25Hz to 100Hz), it successfully recovers high-frequency diagnostic landmarks, achieving an unprecedented Pearson correlation of 0.9956. By learning a universal discrete language for biological signals with a fraction of the computational footprint, our approach sets a new trajectory for edge-deployable, multi-modal medical foundation models.

View on arXiv PDF

Similar