Junhua Li

CV
8papers
328citations
Novelty48%
AI Score47

8 Papers

SDApr 11, 2023
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

Guangyong Wei, Zhikui Duan, Shiren Li et al.

In recent years, a great deal of attention has been paid to the Transformer network for speech recognition tasks due to its excellent model performance. However, the Transformer network always involves heavy computation and large number of parameters, causing serious deployment problems in devices with limited computation sources or storage memory. In this paper, a new lightweight model called Sim-T has been proposed to expand the generality of the Transformer model. Under the help of the newly developed multiplexing technique, the Sim-T can efficiently compress the model with negligible sacrifice on its performance. To be more precise, the proposed technique includes two parts, that are, module weight multiplexing and attention score multiplexing. Moreover, a novel decoder structure has been proposed to facilitate the attention score multiplexing. Extensive experiments have been conducted to validate the effectiveness of Sim-T. In Aishell-1 dataset, when the proposed Sim-T is 48% parameter less than the baseline Transformer, 0.4% CER improvement can be obtained. Alternatively, 69% parameter reduction can be achieved if the Sim-T gives the same performance as the baseline Transformer. With regard to the HKUST and WSJ eval92 datasets, CER and WER will be improved by 0.3% and 0.2%, respectively, when parameters in Sim-T are 40% less than the baseline Transformer.

29.0SDApr 1
Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability

Xiaoxu Zhu, Junhua Li, Aaron J. Li et al.

Self-supervised speech models learn representations that capture both content and speaker information. Yet this entanglement creates problems: content tasks suffer from speaker bias, and privacy concerns arise when speaker identity leaks through supposedly anonymized representations. We present two contributions to address these challenges. First, we develop InterpTRQE-SptME (Timbre Residual Quantitative Evaluation Benchmark of Speech pre-training Models Encoding via Interpretability), a benchmark that directly measures residual speaker information in content embeddings using SHAP-based interpretability analysis. Unlike existing indirect metrics, our approach quantifies the exact proportion of speaker information remaining after disentanglement. Second, we propose InterpTF-SptME, which uses these interpretability insights to filter speaker information from embeddings. Testing on VCTK with seven models including HuBERT, WavLM, and ContentVec, we find that SHAP Noise filtering reduces speaker residuals from 18.05% to nearly zero while maintaining recognition accuracy (CTC loss increase under 1%). The method is model-agnostic and requires no retraining.

NCAug 25, 2025
DLGE: Dual Local-Global Encoding for Generalizable Cross-BCI-Paradigm

Jingyuan Wang, Junhua Li

Deep learning models have been frequently used to decode a single brain-computer interface (BCI) paradigm based on electroencephalography (EEG). It is challenging to decode multiple BCI paradigms using one model due to diverse barriers, such as different channel configurations and disparate task-related representations. In this study, we propose Dual Local-Global Encoder (DLGE), enabling the classification across different BCI paradigms. To address the heterogeneity in EEG channel configurations across paradigms, we employ an anatomically inspired brain-region partitioning and padding strategy to standardize EEG channel configuration. In the proposed model, the local encoder is designed to learn shared features across BCI paradigms within each brain region based on time-frequency information, which integrates temporal attention on individual channels with spatial attention among channels for each brain region. These shared features are subsequently aggregated in the global encoder to form respective paradigm-specific feature representations. Three BCI paradigms (motor imagery, resting state, and driving fatigue) were used to evaluate the proposed model. The results demonstrate that our model is capable of processing diverse BCI paradigms without retraining and retuning, achieving average macro precision, recall, and F1-score of 60.16\%, 59.88\%, and 59.56\%, respectively. We made an initial attempt to develop a general model for cross-BCI-paradigm classification, avoiding retraining or redevelopment for each paradigm. This study paves the way for the development of an effective but simple model for cross-BCI-paradigm decoding, which might benefit the design of portable devices for universal BCI decoding.

QMAug 12, 2025
Cross-BCI, A Cross-BCI-Paradigm Classifica-tion Model Towards Universal BCI Applications

Gaojie Zhou, Junhua Li

Classification models used in brain-computer interface (BCI) are usually designed for a single BCI paradigm. This requires the redevelopment of the model when applying it to a new BCI paradigm, resulting in repeated costs and effort. Moreover, less complex deep learning models are desired for practical usage, as well as for deployment on portable devices. In or-der to fill the above gaps, we, in this study, proposed a light-weight and unified decoding model for cross-BCI-paradigm classification. The proposed model starts with a tempo-spatial convolution. It is followed by a multi-scale local feature selec-tion module, aiming to extract local features shared across BCI paradigms and generate weighted features. Finally, a mul-ti-dimensional global feature extraction module is designed, in which multi-dimensional global features are extracted from the weighted features and fused with the weighted features to form high-level feature representations associated with BCI para-digms. The results, evaluated on a mixture of three classical BCI paradigms (i.e., MI, SSVEP, and P300), demon-strate that the proposed model achieves 88.39%, 82.36%, 80.01%, and 0.8092 for accuracy, macro-precision, mac-ro-recall, and macro-F1-score, respectively, significantly out-performing the compared models. This study pro-vides a feasible solution for cross-BCI-paradigm classifica-tion. It lays a technological foundation for de-veloping a new generation of unified decoding systems, paving the way for low-cost and universal practical applications.

SPNov 22, 2020
Deep Learning in EEG: Advance of the Last Ten-Year Critical Period

Shu Gong, Kaibo Xing, Andrzej Cichocki et al.

Deep learning has achieved excellent performance in a wide range of domains, especially in speech recognition and computer vision. Relatively less work has been done for EEG, but there is still significant progress attained in the last decade. Due to the lack of a comprehensive and topic widely covered survey for deep learning in EEG, we attempt to summarize recent progress to provide an overview, as well as perspectives for future developments. We first briefly mention the artifacts removal for EEG signal and then introduce deep learning models that have been utilized in EEG processing and classification. Subsequently, the applications of deep learning in EEG are reviewed by categorizing them into groups such as brain-computer interface, disease detection, and emotion recognition. They are followed by the discussion, in which the pros and cons of deep learning are presented and future directions and challenges for deep learning in EEG are proposed. We hope that this paper could serve as a summary of past work for deep learning in EEG and the beginning of further developments and achievements of EEG studies based on deep learning.

LGJul 30, 2019
Multi-Kernel Capsule Network for Schizophrenia Identification

Tian Wang, Anastasios Bezerianos, Andrzej Cichocki et al.

Objective: Schizophrenia seriously affects the quality of life. To date, both simple (linear discriminant analysis) and complex (deep neural network) machine learning methods have been utilized to identify schizophrenia based on functional connectivity features. The existing simple methods need two separate steps (i.e., feature extraction and classification) to achieve the identification, which disables simultaneous tuning for the best feature extraction and classifier training. The complex methods integrate two steps and can be simultaneously tuned to achieve optimal performance, but these methods require a much larger amount of data for model training. Methods: To overcome the aforementioned drawbacks, we proposed a multi-kernel capsule network (MKCapsnet), which was developed by considering the brain anatomical structure. Kernels were set to match with partition sizes of brain anatomical structure in order to capture interregional connectivities at the varying scales. With the inspiration of widely-used dropout strategy in deep learning, we developed vector dropout in the capsule layer to prevent overfitting of the model. Results: The comparison results showed that the proposed method outperformed the state-of-the-art methods. Besides, we compared performances using different parameters and illustrated the routing process to reveal characteristics of the proposed method. Conclusion: MKCapsnet is promising for schizophrenia identification. Significance: Our study not only proposed a multi-kernel capsule network but also provided useful information in the parameter setting, which is informative for further studies using a capsule network for neurophysiological signal classification.

CVOct 23, 2014
Canonical Polyadic Decomposition with Auxiliary Information for Brain Computer Interface

Junhua Li, Chao Li, Andrzej Cichocki

Physiological signals are often organized in the form of multiple dimensions (e.g., channel, time, task, and 3D voxel), so it is better to preserve original organization structure when processing. Unlike vector-based methods that destroy data structure, Canonical Polyadic Decomposition (CPD) aims to process physiological signals in the form of multi-way array, which considers relationships between dimensions and preserves structure information contained by the physiological signal. Nowadays, CPD is utilized as an unsupervised method for feature extraction in a classification problem. After that, a classifier, such as support vector machine, is required to classify those features. In this manner, classification task is achieved in two isolated steps. We proposed supervised Canonical Polyadic Decomposition by directly incorporating auxiliary label information during decomposition, by which a classification task can be achieved without an extra step of classifier training. The proposed method merges the decomposition and classifier learning together, so it reduces procedure of classification task compared with that of respective decomposition and classification. In order to evaluate the performance of the proposed method, three different kinds of signals, synthetic signal, EEG signal, and MEG signal, were used. The results based on evaluations of synthetic and real signals demonstrated that the proposed method is effective and efficient.

CVOct 3, 2014
Feature Learning from Incomplete EEG with Denoising Autoencoder

Junhua Li, Zbigniew Struzik, Liqing Zhang et al.

An alternative pathway for the human brain to communicate with the outside world is by means of a brain computer interface (BCI). A BCI can decode electroencephalogram (EEG) signals of brain activities, and then send a command or an intent to an external interactive device, such as a wheelchair. The effectiveness of the BCI depends on the performance in decoding the EEG. Usually, the EEG is contaminated by different kinds of artefacts (e.g., electromyogram (EMG), background activity), which leads to a low decoding performance. A number of filtering methods can be utilized to remove or weaken the effects of artefacts, but they generally fail when the EEG contains extreme artefacts. In such cases, the most common approach is to discard the whole data segment containing extreme artefacts. This causes the fatal drawback that the BCI cannot output decoding results during that time. In order to solve this problem, we employ the Lomb-Scargle periodogram to estimate the spectral power from incomplete EEG (after removing only parts contaminated by artefacts), and Denoising Autoencoder (DAE) for learning. The proposed method is evaluated with motor imagery EEG data. The results show that our method can successfully decode incomplete EEG to good effect.