SPAug 20, 2023Code
Large Transformers are Better EEG LearnersBingxin Wang, Xiaowen Fu, Yuan Lan et al.
Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. However, the limited availability of public electroencephalogram (EEG) data presents a unique challenge for extending the success of these models to EEG-based tasks. To address this gap, we propose AdaCT, plug-and-play Adapters designed for Converting Time series data into spatio-temporal 2D pseudo-images or text forms. Essentially, AdaCT-I transforms multi-channel or lengthy single-channel time series data into spatio-temporal 2D pseudo-images for fine-tuning pre-trained vision transformers, while AdaCT-T converts short single-channel data into text for fine-tuning pre-trained language transformers. The proposed approach allows for seamless integration of pre-trained vision models and language models in time series decoding tasks, particularly in EEG data analysis. Experimental results on diverse benchmark datasets, including Epileptic Seizure Recognition, Sleep-EDF, and UCI HAR, demonstrate the superiority of AdaCT over baseline methods. Overall, we provide a promising transfer learning framework for leveraging the capabilities of pre-trained vision and language models in EEG-based tasks, thereby advancing the field of time series decoding and enhancing interpretability in EEG data analysis. Our code will be available at https://github.com/wangbxj1234/AdaCE.
SPSep 20, 2024Code
Differentially Private Multimodal Laplacian Dropout (DP-MLD) for EEG Representative LearningXiaowen Fu, Bingxin Wang, Xinzhou Guo et al.
Recently, multimodal electroencephalogram (EEG) learning has shown great promise in disease detection. At the same time, ensuring privacy in clinical studies has become increasingly crucial due to legal and ethical concerns. One widely adopted scheme for privacy protection is differential privacy (DP) because of its clear interpretation and ease of implementation. Although numerous methods have been proposed under DP, it has not been extensively studied for multimodal EEG data due to the complexities of models and signal data considered there. In this paper, we propose a novel Differentially Private Multimodal Laplacian Dropout (DP-MLD) scheme for multimodal EEG learning. Our approach proposes a novel multimodal representative learning model that processes EEG data by language models as text and other modal data by vision transformers as images, incorporating well-designed cross-attention mechanisms to effectively extract and integrate cross-modal features. To achieve DP, we design a novel adaptive feature-level Laplacian dropout scheme, where randomness allocation and performance are dynamically optimized within given privacy budgets. In the experiment on an open-source multimodal dataset of Freezing of Gait (FoG) in Parkinson's Disease (PD), our proposed method demonstrates an approximate 4\% improvement in classification accuracy, and achieves state-of-the-art performance in multimodal EEG learning under DP.
CVMar 4
TextBoost: Boosting Scene Text Fidelity in Ultra-low Bitrate Image CompressionBingxin Wang, Yuan Lan, Zhaoyi Sun et al.
Ultra-low bitrate image compression faces a critical challenge: preserving small-font scene text while maintaining overall visual quality. Region-of-interest (ROI) bit allocation can prioritize text but often degrades global fidelity, leading to a trade-off between local accuracy and overall image quality. Instead of relying on ROI coding, we incorporate auxiliary textual information extracted by OCR and transmitted with negligible overhead, enabling the decoder to leverage this semantic guidance. Our method, TextBoost, operationalizes this idea through three strategic designs: (i) adaptively filtering OCR outputs and rendering them into a guidance map; (ii) integrating this guidance with decoder features in a calibrated manner via an attention-guided fusion block; and (iii) enforcing guidance-consistent reconstruction in text regions with a regularizing loss that promotes natural blending with the scene. Extensive experiments on TextOCR and ICDAR 2015 demonstrate that TextBoost yields up to 60.6% higher text-recognition F1 at comparable Peak Signal-to-Noise Ratio (PSNR) and bits per pixel (bpp), producing sharper small-font text while preserving global image quality and effectively decoupling text enhancement from global rate-distortion optimization.