CVJun 22, 2022Code
Symmetric Network with Spatial Relationship Modeling for Natural Language-based Vehicle RetrievalChuyang Zhao, Haobo Chen, Wenyuan Zhang et al.
Natural language (NL) based vehicle retrieval aims to search specific vehicle given text description. Different from the image-based vehicle retrieval, NL-based vehicle retrieval requires considering not only vehicle appearance, but also surrounding environment and temporal relations. In this paper, we propose a Symmetric Network with Spatial Relationship Modeling (SSM) method for NL-based vehicle retrieval. Specifically, we design a symmetric network to learn the unified cross-modal representations between text descriptions and vehicle images, where vehicle appearance details and vehicle trajectory global information are preserved. Besides, to make better use of location information, we propose a spatial relationship modeling methods to take surrounding environment and mutual relationship between vehicles into consideration. The qualitative and quantitative experiments verify the effectiveness of the proposed method. We achieve 43.92% MRR accuracy on the test set of the 6th AI City Challenge on natural language-based vehicle retrieval track, yielding the 1st place among all valid submissions on the public leaderboard. The code is available at https://github.com/hbchen121/AICITY2022_Track2_SSM.
SPAug 28, 2024Code
Brant-X: A Unified Physiological Signal Alignment FrameworkDaoze Zhang, Zhizhang Yuan, Junru Chen et al.
Physiological signals serve as indispensable clues for understanding various physiological states of human bodies. Most existing works have focused on a single type of physiological signals for a range of application scenarios. However, as the body is a holistic biological system, the inherent interconnection among various physiological data should not be neglected. In particular, given the brain's role as the control center for vital activities, electroencephalogram (EEG) exhibits significant correlations with other physiological signals. Therefore, the correlation between EEG and other physiological signals holds potential to improve performance in various scenarios. Nevertheless, achieving this goal is still constrained by several challenges: the scarcity of simultaneously collected physiological data, the differences in correlations between various signals, and the correlation differences between various tasks. To address these issues, we propose a unified physiological signal alignment framework, Brant-X, to model the correlation between EEG and other signals. Our approach (1) employs the EEG foundation model to data-efficiently transfer the rich knowledge in EEG to other physiological signals, and (2) introduces the two-level alignment to fully align the semantics of EEG and other signals from different semantic scales. In the experiments, Brant-X achieves state-of-the-art performance compared with task-agnostic and task-specific baselines on various downstream tasks in diverse scenarios, including sleep stage classification, emotion recognition, freezing of gaits detection, and eye movement communication. Moreover, the analysis on the arrhythmia detection task and the visualization in case study further illustrate the effectiveness of Brant-X in the knowledge transfer from EEG to other physiological signals. The model's homepage is at https://github.com/zjunet/Brant-X/.
SPJun 15, 2023
BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion LearningJunru Chen, Yang Yang, Tao Yu et al.
Epilepsy is one of the most serious neurological diseases, affecting 1-2% of the world's population. The diagnosis of epilepsy depends heavily on the recognition of epileptic waves, i.e., disordered electrical brainwave activity in the patient's brain. Existing works have begun to employ machine learning models to detect epileptic waves via cortical electroencephalogram (EEG). However, the recently developed stereoelectrocorticography (SEEG) method provides information in stereo that is more precise than conventional EEG, and has been broadly applied in clinical practice. Therefore, we propose the first data-driven study to detect epileptic waves in a real-world SEEG dataset. While offering new opportunities, SEEG also poses several challenges. In clinical practice, epileptic wave activities are considered to propagate between different regions in the brain. These propagation paths, also known as the epileptogenic network, are deemed to be a key factor in the context of epilepsy surgery. However, the question of how to extract an exact epileptogenic network for each patient remains an open problem in the field of neuroscience. To address these challenges, we propose a novel model (BrainNet) that jointly learns the dynamic diffusion graphs and models the brain wave diffusion patterns. In addition, our model effectively aids in resisting label imbalance and severe noise by employing several self-supervised learning tasks and a hierarchical framework. By experimenting with the extensive real SEEG dataset obtained from multiple patients, we find that BrainNet outperforms several latest state-of-the-art baselines derived from time-series analysis.
AIJul 31, 2024Code
Con4m: Context-aware Consistency Learning Framework for Segmented Time Series ClassificationJunru Chen, Tianyu Cao, Jing Xu et al.
Time Series Classification (TSC) encompasses two settings: classifying entire sequences or classifying segmented subsequences. The raw time series for segmented TSC usually contain Multiple classes with Varying Duration of each class (MVD). Therefore, the characteristics of MVD pose unique challenges for segmented TSC, yet have been largely overlooked by existing works. Specifically, there exists a natural temporal dependency between consecutive instances (segments) to be classified within MVD. However, mainstream TSC models rely on the assumption of independent and identically distributed (i.i.d.), focusing on independently modeling each segment. Additionally, annotators with varying expertise may provide inconsistent boundary labels, leading to unstable performance of noise-free TSC models. To address these challenges, we first formally demonstrate that valuable contextual information enhances the discriminative power of classification instances. Leveraging the contextual priors of MVD at both the data and label levels, we propose a novel consistency learning framework Con4m, which effectively utilizes contextual information more conducive to discriminating consecutive segments in segmented TSC tasks, while harmonizing inconsistent boundary labels for training. Extensive experiments across multiple datasets validate the effectiveness of Con4m in handling segmented TSC tasks on MVD. The source code is available at https://github.com/MrNobodyCali/Con4m.
NCDec 10, 2025Code
NeuroSketch: An Effective Framework for Neural Decoding via Systematic Architectural OptimizationGaorui Zhang, Zhizhang Yuan, Jialan Yang et al.
Neural decoding, a critical component of Brain-Computer Interface (BCI), has recently attracted increasing research interest. Previous research has focused on leveraging signal processing and deep learning methods to enhance neural decoding performance. However, the in-depth exploration of model architectures remains underexplored, despite its proven effectiveness in other tasks such as energy forecasting and image classification. In this study, we propose NeuroSketch, an effective framework for neural decoding via systematic architecture optimization. Starting with the basic architecture study, we find that CNN-2D outperforms other architectures in neural decoding tasks and explore its effectiveness from temporal and spatial perspectives. Building on this, we optimize the architecture from macro- to micro-level, achieving improvements in performance at each step. The exploration process and model validations take over 5,000 experiments spanning three distinct modalities (visual, auditory, and speech), three types of brain signals (EEG, SEEG, and ECoG), and eight diverse decoding tasks. Experimental results indicate that NeuroSketch achieves state-of-the-art (SOTA) performance across all evaluated datasets, positioning it as a powerful tool for neural decoding. Our code and scripts are available at https://github.com/Galaxy-Dawn/NeuroSketch.
SPJun 15, 2023
MBrain: A Multi-channel Self-Supervised Learning Framework for Brain SignalsDonghong Cai, Junru Chen, Yang Yang et al.
Brain signals are important quantitative data for understanding physiological activities and diseases of human brain. Most existing studies pay attention to supervised learning methods, which, however, require high-cost clinical labels. In addition, the huge difference in the clinical patterns of brain signals measured by invasive (e.g., SEEG) and non-invasive (e.g., EEG) methods leads to the lack of a unified method. To handle the above issues, we propose to study the self-supervised learning (SSL) framework for brain signals that can be applied to pre-train either SEEG or EEG data. Intuitively, brain signals, generated by the firing of neurons, are transmitted among different connecting structures in human brain. Inspired by this, we propose MBrain to learn implicit spatial and temporal correlations between different channels (i.e., contacts of the electrode, corresponding to different brain areas) as the cornerstone for uniformly modeling different types of brain signals. Specifically, we represent the spatial correlation by a graph structure, which is built with proposed multi-channel CPC. We theoretically prove that optimizing the goal of multi-channel CPC can lead to a better predictive representation and apply the instantaneou-time-shift prediction task based on it. Then we capture the temporal correlation by designing the delayed-time-shift prediction task. Finally, replace-discriminative-learning task is proposed to preserve the characteristics of each channel. Extensive experiments of seizure detection on both EEG and SEEG large-scale real-world datasets demonstrate that our model outperforms several state-of-the-art time series SSL and unsupervised models, and has the ability to be deployed to clinical practice.
CVSep 12, 2022
Style Variable and Irrelevant Learning for Generalizable Person Re-identificationHaobo Chen, Chuyang Zhao, Kai Tu et al.
Recently, due to the poor performance of supervised person re-identification (ReID) to an unseen domain, Domain Generalization (DG) person ReID has attracted a lot of attention which aims to learn a domain-insensitive model and can resist the influence of domain bias. In this paper, we first verify through an experiment that style factors are a vital part of domain bias. Base on this conclusion, we propose a Style Variable and Irrelevant Learning (SVIL) method to eliminate the effect of style factors on the model. Specifically, we design a Style Jitter Module (SJM) in SVIL. The SJM module can enrich the style diversity of the specific source domain and reduce the style differences of various source domains. This leads to the model focusing on identity-relevant information and being insensitive to the style changes. Besides, we organically combine the SJM module with a meta-learning algorithm, maximizing the benefits and further improving the generalization ability of the model. Note that our SJM module is plug-and-play and inference cost-free. Extensive experiments confirm the effectiveness of our SVIL and our method outperforms the state-of-the-art methods on DG-ReID benchmarks by a large margin.
NCJan 28
Assembling the Mind's Mosaic: Towards EEG Semantic Intent DecodingJiahe Li, Junru Chen, Fanqi Shen et al.
Enabling natural communication through brain-computer interfaces (BCIs) remains one of the most profound challenges in neuroscience and neurotechnology. While existing frameworks offer partial solutions, they are constrained by oversimplified semantic representations and a lack of interpretability. To overcome these limitations, we introduce Semantic Intent Decoding (SID), a novel framework that translates neural activity into natural language by modeling meaning as a flexible set of compositional semantic units. SID is built on three core principles: semantic compositionality, continuity and expandability of semantic space, and fidelity in reconstruction. We present BrainMosaic, a deep learning architecture implementing SID. BrainMosaic decodes multiple semantic units from EEG/SEEG signals using set matching and then reconstructs coherent sentences through semantic-guided reconstruction. This approach moves beyond traditional pipelines that rely on fixed-class classification or unconstrained generation, enabling a more interpretable and expressive communication paradigm. Extensive experiments on multilingual EEG and clinical SEEG datasets demonstrate that SID and BrainMosaic offer substantial advantages over existing frameworks, paving the way for natural and effective BCI-mediated communication.
LGFeb 1, 2024
Are Synthetic Time-series Data Really not as Good as Real Data?Fanzhe Fu, Junru Chen, Jing Zhang et al.
Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem. Integrating universal data synthesis methods holds promise in improving generalization. However, current methods cannot guarantee that the generator's output covers all unseen real data. In this paper, we introduce InfoBoost -- a highly versatile cross-domain data synthesizing framework with time series representation learning capability. We have developed a method based on synthetic data that enables model training without the need for real data, surpassing the performance of models trained with real data. Additionally, we have trained a universal feature extractor based on our synthetic data that is applicable to all time-series data. Our approach overcomes interference from multiple sources rhythmic signal, noise interference, and long-period features that exceed sampling window capabilities. Through experiments, our non-deep-learning synthetic data enables models to achieve superior reconstruction performance and universal explicit representation extraction without the need for real data.
NCFeb 24, 2025
Deep Learning-Powered Electrical Brain Signals Analysis: Advancing Neurological DiagnosticsJiahe Li, Xin Chen, Fanqi Shen et al.
Neurological disorders pose major global health challenges, driving advances in brain signal analysis. Scalp electroencephalography (EEG) and intracranial EEG (iEEG) are widely used for diagnosis and monitoring. However, dataset heterogeneity and task variations hinder the development of robust deep learning solutions. This review systematically examines recent advances in deep learning approaches for EEG/iEEG-based neurological diagnostics, focusing on applications across 7 neurological conditions using 46 datasets. For each condition, we review representative methods and their quantitative results, integrating performance comparisons with analyses of data usage, model design, and task-specific adaptations, while highlighting the role of pre-trained multi-task models in achieving scalable, generalizable solutions. Finally, we propose a standardized benchmark to evaluate models across diverse datasets and improve reproducibility, emphasizing how recent innovations are transforming neurological diagnostics toward intelligent, adaptable healthcare systems.
AIMay 23, 2025
PD$^3$: A Project Duplication Detection Framework via Adapted Multi-Agent DebateDezheng Bao, Yueci Yang, Xin Chen et al.
Project duplication detection is critical for project quality assessment, as it improves resource utilization efficiency by preventing investing in newly proposed project that have already been studied. It requires the ability to understand high-level semantics and generate constructive and valuable feedback. Existing detection methods rely on basic word- or sentence-level comparison or solely apply large language models, lacking valuable insights for experts and in-depth comprehension of project content and review criteria. To tackle this issue, we propose PD$^3$, a Project Duplication Detection framework via adapted multi-agent Debate. Inspired by real-world expert debates, it employs a fair competition format to guide multi-agent debate to retrieve relevant projects. For feedback, it incorporates both qualitative and quantitative analysis to improve its practicality. Over 800 real-world power project data spanning more than 20 specialized fields are used to evaluate the framework, demonstrating that our method outperforms existing approaches by 7.43% and 8.00% in two downstream tasks. Furthermore, we establish an online platform, Review Dingdang, to assist power experts, saving 5.73 million USD in initial detection on more than 100 newly proposed projects.
CVApr 30, 2021
Vehicle Re-identification Method Based on Vehicle Attribute and Mutual Exclusion Between CamerasJunru Chen, Shiqing Geng, Yongluan Yan et al.
Vehicle Re-identification aims to identify a specific vehicle across time and camera view. With the rapid growth of intelligent transportation systems and smart cities, vehicle Re-identification technology gets more and more attention. However, due to the difference of shooting angle and the high similarity of vehicles belonging to the same brand, vehicle re-identification becomes a great challenge for existing method. In this paper, we propose a vehicle attribute-guided method to re-rank vehicle Re-ID result. The attributes used include vehicle orientation and vehicle brand . We also focus on the camera information and introduce camera mutual exclusion theory to further fine-tune the search results. In terms of feature extraction, we combine the data augmentations of multi-resolutions with the large model ensemble to get a more robust vehicle features. Our method achieves mAP of 63.73% and rank-1 accuracy 76.61% in the CVPR 2021 AI City Challenge.
LGDec 4, 2020
Unsupervised Adversarially-Robust Representation Learning on GraphsJiarong Xu, Yang Yang, Junru Chen et al.
Unsupervised/self-supervised pre-training methods for graph representation learning have recently attracted increasing research interests, and they are shown to be able to generalize to various downstream applications. Yet, the adversarial robustness of such pre-trained graph learning models remains largely unexplored. More importantly, most existing defense techniques designed for end-to-end graph representation learning methods require pre-specified label definitions, and thus cannot be directly applied to the pre-training methods. In this paper, we propose an unsupervised defense technique to robustify pre-trained deep graph models, so that the perturbations on the input graph can be successfully identified and blocked before the model is applied to different downstream tasks. Specifically, we introduce a mutual information-based measure, \textit{graph representation vulnerability (GRV)}, to quantify the robustness of graph encoders on the representation space. We then formulate an optimization problem to learn the graph representation by carefully balancing the trade-off between the expressive power and the robustness (\emph{i.e.}, GRV) of the graph encoder. The discrete nature of graph topology and the joint space of graph data make the optimization problem intractable to solve. To handle the above difficulty and to reduce computational expense, we further relax the problem and thus provide an approximate solution. Additionally, we explore a provable connection between the robustness of the unsupervised graph encoder and that of models on downstream tasks. Extensive experiments demonstrate that even without access to labels and tasks, our model is still able to enhance robustness against adversarial attacks on three downstream tasks (node classification, link prediction, and community detection) by an average of +16.5% compared with existing methods.
CVMay 13, 2020
Attribute-guided Feature Extraction and Augmentation Robust Learning for Vehicle Re-identificationChaoran Zhuge, Yujie Peng, Yadong Li et al.
Vehicle re-identification is one of the core technologies of intelligent transportation systems and smart cities, but large intra-class diversity and inter-class similarity poses great challenges for existing method. In this paper, we propose a multi-guided learning approach which utilizing the information of attributes and meanwhile introducing two novel random augments to improve the robustness during training. What's more, we propose an attribute constraint method and group re-ranking strategy to refine matching results. Our method achieves mAP of 66.83% and rank-1 accuracy 76.05% in the CVPR 2020 AI City Challenge.