SPSep 12, 2024
EEG-EMG FAConformer: Frequency Aware Conv-Transformer for the fusion of EEG and EMGZhengXiao He, Minghong Cai, Letian Li et al.
Motor pattern recognition paradigms are the main forms of Brain-Computer Interfaces(BCI) aimed at motor function rehabilitation and are the most easily promoted applications. In recent years, many researchers have suggested encouraging patients to perform real motor control execution simultaneously in MI-based BCI rehabilitation training systems. Electromyography (EMG) signals are the most direct physiological signals that can assess the execution of movements. Multimodal signal fusion is practically significant for decoding motor patterns. Therefore, we introduce a multimodal motion pattern recognition algorithm for EEG and EMG signals: EEG-EMG FAConformer, a method with several attention modules correlated with temporal and frequency information for motor pattern recognition. We especially devise a frequency band attention module to encode EEG information accurately and efficiently. What's more, modules like Multi-Scale Fusion Module, Independent Channel-Specific Convolution Module(ICSCM), and Fuse Module which can effectively eliminate irrelevant information in EEG and EMG signals and fully exploit hidden dynamics are developed and show great effects. Extensive experiments show that EEG-EMG FAConformer surpasses existing methods on Jeong2020 dataset, showcasing outstanding performance, high robustness and impressive stability.
LGJan 21Code
Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM DecodingHuayu Li, ZhengXiao He, Siyuan Tian et al.
Standard autoregressive decoding in large language models (LLMs) is inherently short-sighted, often failing to find globally optimal reasoning paths due to its token-by-token generation process. While inference-time strategies like foresight sampling attempt to mitigate this by simulating future steps, they typically rely on ad-hoc heuristics for valuing paths and pruning the search space. This paper introduces Martingale Foresight Sampling (MFS), a principled framework that reformulates LLM decoding as a problem of identifying an optimal stochastic process. By modeling the quality of a reasoning path as a stochastic process, we leverage Martingale theory to design a theoretically-grounded algorithm. Our approach replaces heuristic mechanisms with principles from probability theory: step valuation is derived from the Doob Decomposition Theorem to measure a path's predictable advantage, path selection uses Optional Stopping Theory for principled pruning of suboptimal candidates, and an adaptive stopping rule based on the Martingale Convergence Theorem terminates exploration once a path's quality has provably converged. Experiments on six reasoning benchmarks demonstrate that MFS surpasses state-of-the-art methods in accuracy while significantly improving computational efficiency. Code will be released at https://github.com/miraclehetech/EACL2026-Martingale-Foresight-Sampling.
86.4SPApr 17
MedMamba: Recasting Mamba for Medical Time Series ClassificationZhengXiao He, Huayu Li, Xiwen Chen et al.
Medical time series, such as electrocardiograms (ECG) and electroencephalograms (EEG), exhibit complex temporal dynamics and structured cross-channel dependencies, posing fundamental challenges for automated analysis. Conventional convolutional and recurrent models struggle to capture long-range dependencies, while Transformer-based approaches incur quadratic complexity and often introduce redundant interactions that are misaligned with the intrinsic structure of physiological signals. To address these limitations, we propose MedMamba, a principle-driven multi-scale bidirectional state space architecture tailored for medical time series classification. Our design is guided by three key inductive biases of physiological signals: spatial centralization, multi-timescale temporal composition, and non-causal contextual dependency. These principles are instantiated through a lightweight channel-mixing module for cross-channel reparameterization, multi-scale convolutional tokenization for temporal decomposition, and bidirectional Mamba blocks for efficient global context modeling with linear complexity. Extensive experiments on six benchmark datasets spanning EEG, ECG, and human activity signals demonstrate that MedMamba consistently outperforms state-of-the-art methods across diverse modalities. Notably, it achieves 85.97% accuracy on PTB and establishes new state-of-the-art performance on the challenging ADFTD dataset (54.72% accuracy and 52.01% F1-score). Strong results on long-sequence benchmarks, such as SleepEDF, further validate its capability in modeling long-range dependencies. Moreover, MedMamba achieves a speedup of 4.6x in inference, highlighting its practicality for real-time clinical deployment. These results suggest that principle-guided state space modeling offers an effective and scalable alternative to Transformer-based approaches for medical time series analysis.
33.7LGApr 30
Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information MaximizationHuayu Li, ZhengXiao He, Xiwen Chen et al.
Learning meaningful representations from medical time series (MedTS) such as ECG or EEG signals is a critical challenge. These signals are often high-dimensional, variable-length and rife with noise. Existing self-supervised approaches, such as Masked Autoencoders (MAEs) are highly effective for pre-training general-purpose encoders. However, they do not explicitly learn compact and semantically interpretable latent representations, typically relying on heuristic aggregation strategies such as global average pooling or a designated [CLS] token. We propose a novel framework that compresses a variable-length MedTS into a fixed-size set of $k$ latent Fingerprint Tokens. Our architecture employs a cross-attention bottleneck to generate these tokens and is trained with a dual-objective function. The first objective is a reconstruction loss, which ensures the tokens are \textit{sufficient statistics} for the original data. The second, a diversity penalty based on the Total Coding Rate (TCR), explicitly minimizes the redundancy between tokens, encouraging them to become statistically \textit{disentangled} representations. We present the theoretical justification for our method, framing it as a novel \textbf{Disentangled Rate-Distortion} problem. This approach produces a low-dimensional, interpretable, and sample-efficient representation, where each token is encouraged to capture an independent factor of variation, paving the way for more robust digital biomarkers.
SPJul 12, 2025
NeuroHD-RA: Neural-distilled Hyperdimensional Model with Rhythm AlignmentZhengXiao He, Jinghao Wen, Huayu Li et al.
We present a novel and interpretable framework for electrocardiogram (ECG)-based disease detection that combines hyperdimensional computing (HDC) with learnable neural encoding. Unlike conventional HDC approaches that rely on static, random projections, our method introduces a rhythm-aware and trainable encoding pipeline based on RR intervals, a physiological signal segmentation strategy that aligns with cardiac cycles. The core of our design is a neural-distilled HDC architecture, featuring a learnable RR-block encoder and a BinaryLinear hyperdimensional projection layer, optimized jointly with cross-entropy and proxy-based metric loss. This hybrid framework preserves the symbolic interpretability of HDC while enabling task-adaptive representation learning. Experiments on Apnea-ECG and PTB-XL demonstrate that our model significantly outperforms traditional HDC and classical ML baselines, achieving 73.09\% precision and an F1 score of 0.626 on Apnea-ECG, with comparable robustness on PTB-XL. Our framework offers an efficient and scalable solution for edge-compatible ECG classification, with strong potential for interpretable and personalized health monitoring.
CVJun 18, 2025
Break Stylistic Sophon: Are We Really Meant to Confine the Imagination in Style Transfer?Gary Song Yan, Yusen Zhang, Jinyu Zhao et al.
In this pioneering study, we introduce StyleWallfacer, a groundbreaking unified training and inference framework, which not only addresses various issues encountered in the style transfer process of traditional methods but also unifies the framework for different tasks. This framework is designed to revolutionize the field by enabling artist level style transfer and text driven stylization. First, we propose a semantic-based style injection method that uses BLIP to generate text descriptions strictly aligned with the semantics of the style image in CLIP space. By leveraging a large language model to remove style-related descriptions from these descriptions, we create a semantic gap. This gap is then used to fine-tune the model, enabling efficient and drift-free injection of style knowledge. Second, we propose a data augmentation strategy based on human feedback, incorporating high-quality samples generated early in the fine-tuning process into the training set to facilitate progressive learning and significantly reduce its overfitting. Finally, we design a training-free triple diffusion process using the fine-tuned model, which manipulates the features of self-attention layers in a manner similar to the cross-attention mechanism. Specifically, in the generation process, the key and value of the content-related process are replaced with those of the style-related process to inject style while maintaining text control over the model. We also introduce query preservation to mitigate disruptions to the original content. Under such a design, we have achieved high-quality image-driven style transfer and text-driven stylization, delivering artist-level style transfer results while preserving the original image content. Moreover, we achieve image color editing during the style transfer process for the first time.
IVApr 6, 2024
FastHDRNet: A new efficient method for SDR-to-HDR TranslationSiyuan Tian, Hao Wang, Yiren Rong et al.
Modern displays nowadays possess the capability to render video content with a high dynamic range (HDR) and an extensive color gamut .However, the majority of available resources are still in standard dynamic range (SDR). Therefore, we need to identify an effective methodology for this objective.The existing deep neural networks (DNN) based SDR to HDR conversion methods outperforms conventional methods, but they are either too large to implement or generate some terrible artifacts. We propose a neural network for SDR to HDR conversion, termed "FastHDRNet". This network includes two parts, Adaptive Universal Color Transformation (AUCT) and Local Enhancement (LE). The architecture is designed as a lightweight network that utilizes global statistics and local information with super high efficiency. After the experiment, we find that our proposed method achieves state-of-the-art performance in both quantitative comparisons and visual quality with a lightweight structure and a enhanced infer speed.