Zhiwei Guo

AI
h-index35
6papers
155citations
Novelty59%
AI Score48

6 Papers

CLJan 1
DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

Yuxin Li, Xiangyu Zhang, Yifei Li et al.

Speech is a scalable and non-invasive biomarker for early mental health screening. However, widely used depression datasets like DAIC-WOZ exhibit strong coupling between linguistic sentiment and diagnostic labels, encouraging models to learn semantic shortcuts. As a result, model robustness may be compromised in real-world scenarios, such as Camouflaged Depression, where individuals maintain socially positive or neutral language despite underlying depressive states. To mitigate this semantic bias, we propose DepFlow, a three-stage depression-conditioned text-to-speech framework. First, a Depression Acoustic Encoder learns speaker- and content-invariant depression embeddings through adversarial training, achieving effective disentanglement while preserving depression discriminability (ROC-AUC: 0.693). Second, a flow-matching TTS model with FiLM modulation injects these embeddings into synthesis, enabling control over depressive severity while preserving content and speaker identity. Third, a prototype-based severity mapping mechanism provides smooth and interpretable manipulation across the depression continuum. Using DepFlow, we construct a Camouflage Depression-oriented Augmentation (CDoA) dataset that pairs depressed acoustic patterns with positive/neutral content from a sentiment-stratified text bank, creating acoustic-semantic mismatches underrepresented in natural data. Evaluated across three depression detection architectures, CDoA improves macro-F1 by 9%, 12%, and 5%, respectively, consistently outperforming conventional augmentation strategies in depression Detection. Beyond enhancing robustness, DepFlow provides a controllable synthesis platform for conversational systems and simulation-based evaluation, where real clinical data remains limited by ethical and coverage constraints.

SPApr 2, 2025Code
Decoding Covert Speech from EEG Using a Functional Areas Spatio-Temporal Transformer

Muyun Jiang, Yi Ding, Wei Zhang et al.

Covert speech involves imagining speaking without audible sound or any movements. Decoding covert speech from electroencephalogram (EEG) is challenging due to a limited understanding of neural pronunciation mapping and the low signal-to-noise ratio of the signal. In this study, we developed a large-scale multi-utterance speech EEG dataset from 57 right-handed native English-speaking subjects, each performing covert and overt speech tasks by repeating the same word in five utterances within a ten-second duration. Given the spatio-temporal nature of the neural activation process during speech pronunciation, we developed a Functional Areas Spatio-temporal Transformer (FAST), an effective framework for converting EEG signals into tokens and utilizing transformer architecture for sequence encoding. Our results reveal distinct and interpretable speech neural features by the visualization of FAST-generated activation maps across frontal and temporal brain regions with each word being covertly spoken, providing new insights into the discriminative features of the neural representation of covert speech. This is the first report of such a study, which provides interpretable evidence for speech decoding from EEG. The code for this work has been made public at https://github.com/Jiang-Muyun/FAST

LGSep 29, 2025
ELASTIQ: EEG-Language Alignment with Semantic Task Instruction and Querying

Muyun Jiang, Shuailei Zhang, Zhenjie Yang et al.

Recent advances in electroencephalography (EEG) foundation models, which capture transferable EEG representations, have greatly accelerated the development of brain-computer interfaces (BCI). However, existing approaches still struggle to incorporate language instructions as prior constraints for EEG representation learning, limiting their ability to leverage the semantic knowledge inherent in language to unify different labels and tasks. To address this challenge, we present ELASTIQ, a foundation model for EEG-Language Alignment with Semantic Task Instruction and Querying. ELASTIQ integrates task-aware semantic guidance to produce structured and linguistically aligned EEG embeddings, thereby enhancing decoding robustness and transferability. In the pretraining stage, we introduce a joint Spectral-Temporal Reconstruction (STR) module, which combines frequency masking as a global spectral perturbation with two complementary temporal objectives: random masking to capture contextual dependencies and causal masking to model sequential dynamics. In the instruction tuning stage, we propose the Instruction-conditioned Q-Former (IQF), a query-based cross-attention transformer that injects instruction embeddings into EEG tokens and aligns them with textual label embeddings through learnable queries. We evaluate ELASTIQ on 20 datasets spanning motor imagery, emotion recognition, steady-state visual evoked potentials, covert speech, and healthcare tasks. ELASTIQ achieves state-of-the-art performance on 14 of the 20 datasets and obtains the best average results across all five task categories. Importantly, our analyses reveal for the first time that explicit task instructions serve as semantic priors guiding EEG embeddings into coherent and linguistically grounded spaces. The code and pre-trained weights will be released.

CVJun 28, 2025
Deep Learning based Joint Geometry and Attribute Up-sampling for Large-Scale Colored Point Clouds

Yun Zhang, Feifan Chen, Na Li et al.

Colored point cloud, which includes geometry and attribute components, is a mainstream representation enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method that learns to model both geometry and attribute patterns while leveraging spatial attribute correlations. First, we establish and release a large-scale dataset for colored point cloud up-sampling called SYSU-PCUD, containing 121 large-scale colored point clouds with diverse geometry and attribute complexities across six categories and four sampling rates. Second, to improve the quality of up-sampled point clouds, we propose a deep learning-based JGAU framework that jointly up-samples geometry and attributes. It consists of a geometry up-sampling network and an attribute up-sampling network, where the latter leverages the up-sampled auxiliary geometry to model neighborhood correlations of the attributes. Third, we propose two coarse attribute up-sampling methods, Geometric Distance Weighted Attribute Interpolation (GDWAI) and Deep Learning-based Attribute Interpolation (DLAI), to generate coarse up-sampled attributes for each point. Then, an attribute enhancement module is introduced to refine these up-sampled attributes and produce high-quality point clouds by further exploiting intrinsic attribute and geometry patterns. Extensive experiments show that the Peak Signal-to-Noise Ratio (PSNR) achieved by the proposed JGAU method is 33.90 decibels, 32.10 decibels, 31.10 decibels, and 30.39 decibels for up-sampling rates of 4 times, 8 times, 12 times, and 16 times, respectively. Compared to state-of-the-art methods, JGAU achieves average PSNR gains of 2.32 decibels, 2.47 decibels, 2.28 decibels, and 2.11 decibels at these four up-sampling rates, demonstrating significant improvement.

AIApr 23, 2021
Secure Artificial Intelligence of Things for Implicit Group Recommendations

Keping Yu, Zhiwei Guo, Yu Shen et al.

The emergence of Artificial Intelligence of Things (AIoT) has provided novel insights for many social computing applications such as group recommender systems. As distance among people has been greatly shortened, it has been a more general demand to provide personalized services to groups instead of individuals. In order to capture group-level preference features from individuals, existing methods were mostly established via aggregation and face two aspects of challenges: secure data management workflow is absent, and implicit preference feedbacks is ignored. To tackle current difficulties, this paper proposes secure Artificial Intelligence of Things for implicit Group Recommendations (SAIoT-GR). As for hardware module, a secure IoT structure is developed as the bottom support platform. As for software module, collaborative Bayesian network model and non-cooperative game are can be introduced as algorithms. Such a secure AIoT architecture is able to maximize the advantages of the two modules. In addition, a large number of experiments are carried out to evaluate the performance of the SAIoT-GR in terms of efficiency and robustness.

IRJan 29, 2021
Implicit Feedback-based Group Recommender System for Internet of Thing Applications

Zhiwei Guo, Keping Yu, Tan Guo et al.

With the prevalence of Internet of Things (IoT)-based social media applications, the distance among people has been greatly shortened. As a result, recommender systems in IoT-based social media need to be developed oriented to groups of users rather than individual users. However, existing methods were highly dependent on explicit preference feedbacks, ignoring scenarios of implicit feedback. To remedy such gap, this paper proposes an implicit feedback-based group recommender system using probabilistic inference and non-cooperative game(GREPING) for IoT-based social media. Particularly, unknown process variables can be estimated from observable implicit feedbacks via Bayesian posterior probability inference. In addition, the globally optimal recommendation results can be calculated with the aid of non-cooperative game. Two groups of experiments are conducted to assess the GREPING from two aspects: efficiency and robustness. Experimental results show obvious promotion and considerable stability of the GREPING compared to baseline methods.