Ying Tan

LG
h-index5
40papers
987citations
Novelty48%
AI Score56

40 Papers

NCMar 31, 2023Code
Temporal Dynamic Synchronous Functional Brain Network for Schizophrenia Diagnosis and Lateralization Analysis

Cheng Zhu, Ying Tan, Shuqi Yang et al.

The available evidence suggests that dynamic functional connectivity (dFC) can capture time-varying abnormalities in brain activity in resting-state cerebral functional magnetic resonance imaging (rs-fMRI) data and has a natural advantage in uncovering mechanisms of abnormal brain activity in schizophrenia(SZ) patients. Hence, an advanced dynamic brain network analysis model called the temporal brain category graph convolutional network (Temporal-BCGCN) was employed. Firstly, a unique dynamic brain network analysis module, DSF-BrainNet, was designed to construct dynamic synchronization features. Subsequently, a revolutionary graph convolution method, TemporalConv, was proposed, based on the synchronous temporal properties of feature. Finally, the first modular abnormal hemispherical lateralization test tool in deep learning based on rs-fMRI data, named CategoryPool, was proposed. This study was validated on COBRE and UCLA datasets and achieved 83.62% and 89.71% average accuracies, respectively, outperforming the baseline model and other state-of-the-art methods. The ablation results also demonstrate the advantages of TemporalConv over the traditional edge feature graph convolution approach and the improvement of CategoryPool over the classical graph pooling approach. Interestingly, this study showed that the lower order perceptual system and higher order network regions in the left hemisphere are more severely dysfunctional than in the right hemisphere in SZ and reaffirms the importance of the left medial superior frontal gyrus in SZ. Our core code is available at: https://github.com/swfen/Temporal-BCGCN.

CVJun 22, 2022Code
ProtoCLIP: Prototypical Contrastive Language Image Pretraining

Delong Chen, Zhao Wu, Fan Liu et al.

Contrastive Language Image Pretraining (CLIP) has received widespread attention, since its learned representations can be transferred well to various downstream tasks. During the training process of the CLIP model, the InfoNCE objective aligns positive image-text pairs and separates negative ones. We show an underlying representation grouping effect during this process: the InfoNCE objective indirectly groups semantically similar representations together via randomly emerged within-modal anchors. Based on this understanding, in this paper, Prototypical Contrastive Language Image Pretraining (ProtoCLIP) is introduced to enhance such grouping by boosting its efficiency and increasing its robustness against the modality gap. Specifically, ProtoCLIP sets up prototype-level discrimination between image and text spaces, which efficiently transfers higher-level structural knowledge. Further, Prototypical Back Translation (PBT) is proposed to decouple representation grouping from representation alignment, resulting in effective learning of meaningful representations under large modality gap. The PBT also enables us to introduce additional external teachers with richer prior language knowledge. ProtoCLIP is trained with an online episodic training strategy, which makes it can be scaled up to unlimited amounts of data. We train our ProtoCLIP on Conceptual Captions and achieved an +5.81% ImageNet linear probing improvement and an +2.01% ImageNet zero-shot classification improvement. On the larger YFCC-15M dataset, ProtoCLIP matches the performance of CLIP with 33% of training time. Codes are available at https://github.com/megvii-research/protoclip.

CVApr 17, 2023
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning

Yang Liu, Ying Tan, Jingzhou Luo et al.

Existing visual question reasoning methods usually fail to explicitly discover the inherent causal mechanism and ignore jointly modeling cross-modal event temporality and causality. In this paper, we propose a visual question reasoning framework named Cross-Modal Question Reasoning (CMQR), to discover temporal causal structure and mitigate visual spurious correlation by causal intervention. To explicitly discover visual causal structure, the Visual Causality Discovery (VCD) architecture is proposed to find question-critical scene temporally and disentangle the visual spurious correlations by attention-based front-door causal intervention module named Local-Global Causal Attention Module (LGCAM). To align the fine-grained interactions between linguistic semantics and spatial-temporal representations, we build an Interactive Visual-Linguistic Transformer (IVLT) that builds the multi-modal co-occurrence interactions between visual and linguistic content. Extensive experiments on four datasets demonstrate the superiority of CMQR for discovering visual causal structures and achieving robust question reasoning.

CVApr 28, 2022
Self-supervised Contrastive Learning for Audio-Visual Action Recognition

Yang Liu, Ying Tan, Haoyuan Lan

The underlying correlation between audio and visual modalities can be utilized to learn supervised information for unlabeled videos. In this paper, we propose an end-to-end self-supervised framework named Audio-Visual Contrastive Learning (AVCL), to learn discriminative audio-visual representations for action recognition. Specifically, we design an attention based multi-modal fusion module (AMFM) to fuse audio and visual modalities. To align heterogeneous audio-visual modalities, we construct a novel co-correlation guided representation alignment module (CGRA). To learn supervised information from unlabeled videos, we propose a novel self-supervised contrastive learning module (SelfCL). Furthermore, we build a new audio-visual action recognition dataset named Kinetics-Sounds100. Experimental results on Kinetics-Sounds32 and Kinetics-Sounds100 datasets demonstrate the superiority of our AVCL over the state-of-the-art methods on large-scale action recognition benchmark.

CVMar 28, 2022
Optimization of Directional Landmark Deployment for Visual Observer on SE(3)

Zike Lei, Xi Chen, Ying Tan et al.

An optimization method is proposed in this paper for novel deployment of given number of directional landmarks (location and pose) within a given region in the 3-D task space. This new deployment technique is built on the geometric models of both landmarks and the monocular camera. In particular, a new concept of Multiple Coverage Probability (MCP) is defined to characterize the probability of at least n landmarks being covered simultaneously by a camera at a fixed position. The optimization is conducted with respect to the position and pose of the given number of landmarks to maximize MCP through globally exploration of the given 3-D space. By adopting the elimination genetic algorithm, the global optimal solutions can be obtained, which are then applied to improve the convergent performance of the visual observer on SE(3) as a demonstration example. Both simulation and experimental results are presented to validate the effectiveness of the proposed landmark deployment optimization method.

CVAug 23, 2023Code
Edge-aware Hard Clustering Graph Pooling for Brain Imaging

Cheng Zhu, Jiayi Zhu, Xi Wu et al.

Graph Convolutional Networks (GCNs) can capture non-Euclidean spatial dependence between different brain regions. The graph pooling operator, a crucial element of GCNs, enhances the representation learning capability and facilitates the acquisition of abnormal brain maps. However, most existing research designs graph pooling operators solely from the perspective of nodes while disregarding the original edge features. This confines graph pooling application scenarios and diminishes its ability to capture critical substructures. In this paper, we propose a novel edge-aware hard clustering graph pool (EHCPool), which is tailored to dominant edge features and redefines the clustering process. EHCPool initially introduced the 'Edge-to-Node' score criterion which utilized edge information to evaluate the significance of nodes. An innovative Iteration n-top strategy was then developed, guided by edge scores, to adaptively learn sparse hard clustering assignments for graphs. Additionally, a N-E Aggregation strategy is designed to aggregate node and edge features in each independent subgraph. Extensive experiments on the multi-site public datasets demonstrate the superiority and robustness of the proposed model. More notably, EHCPool has the potential to probe different types of dysfunctional brain networks from a data-driven perspective. Method code: https://github.com/swfen/EHCPool

ROSep 15, 2023
Data-Driven Goal Recognition in Transhumeral Prostheses Using Process Mining Techniques

Zihang Su, Tianshi Yu, Nir Lipovetzky et al.

A transhumeral prosthesis restores missing anatomical segments below the shoulder, including the hand. Active prostheses utilize real-valued, continuous sensor data to recognize patient target poses, or goals, and proactively move the artificial limb. Previous studies have examined how well the data collected in stationary poses, without considering the time steps, can help discriminate the goals. In this case study paper, we focus on using time series data from surface electromyography electrodes and kinematic sensors to sequentially recognize patients' goals. Our approach involves transforming the data into discrete events and training an existing process mining-based goal recognition system. Results from data collected in a virtual reality setting with ten subjects demonstrate the effectiveness of our proposed goal recognition approach, which achieves significantly better precision and recall than the state-of-the-art machine learning techniques and is less confident when wrong, which is beneficial when approximating smoother movements of prostheses.

CVAug 25, 2024
Multi-SIGATnet: A multimodal schizophrenia MRI classification algorithm using sparse interaction mechanisms and graph attention networks

Yuhong Jiao, Jiaqing Miao, Jinnan Gong et al.

Schizophrenia is a serious psychiatric disorder. Its pathogenesis is not completely clear, making it difficult to treat patients precisely. Because of the complicated non-Euclidean network structure of the human brain, learning critical information from brain networks remains difficult. To effectively capture the topological information of brain neural networks, a novel multimodal graph attention network based on sparse interaction mechanism (Multi-SIGATnet) was proposed for SZ classification was proposed for SZ classification. Firstly, structural and functional information were fused into multimodal data to obtain more comprehensive and abundant features for patients with SZ. Subsequently, a sparse interaction mechanism was proposed to effectively extract salient features and enhance the feature representation capability. By enhancing the strong connections and weakening the weak connections between feature information based on an asymmetric convolutional network, high-order interactive features were captured. Moreover, sparse learning strategies were designed to filter out redundant connections to improve model performance. Finally, local and global features were updated in accordance with the topological features and connection weight constraints of the higher-order brain network, the features being projected to the classification target space for disorder classification. The effectiveness of the model is verified on the Center for Biomedical Research Excellence (COBRE) and University of California Los Angeles (UCLA) datasets, achieving 81.9\% and 75.8\% average accuracy, respectively, 4.6\% and 5.5\% higher than the graph attention network (GAT) method. Experiments showed that the Multi-SIGATnet method exhibited good performance in identifying SZ.

LGJul 12, 2025Code
Optimizing Basis Function Selection in Constructive Wavelet Neural Networks and Its Applications

Dunsheng Huang, Dong Shen, Lei Lu et al.

Wavelet neural network (WNN), which learns an unknown nonlinear mapping from the data, has been widely used in signal processing, and time-series analysis. However, challenges in constructing accurate wavelet bases and high computational costs limit their application. This study introduces a constructive WNN that selects initial bases and trains functions by introducing new bases for predefined accuracy while reducing computational costs. For the first time, we analyze the frequency of unknown nonlinear functions and select appropriate initial wavelets based on their primary frequency components by estimating the energy of the spatial frequency component. This leads to a novel constructive framework consisting of a frequency estimator and a wavelet-basis increase mechanism to prioritize high-energy bases, significantly improving computational efficiency. The theoretical foundation defines the necessary time-frequency range for high-dimensional wavelets at a given accuracy. The framework's versatility is demonstrated through four examples: estimating unknown static mappings from offline data, combining two offline datasets, identifying time-varying mappings from time-series data, and capturing nonlinear dependencies in real time-series data. These examples showcase the framework's broad applicability and practicality. All the code will be released at https://github.com/dshuangdd/CWNN.

CVMay 6
A Breast Vision Pathology Foundation Model for Real-world Clinical Utility

Yingxue Xu, Zhengyu Zhang, Xiuming Zhang et al.

Pathology foundation models have shown strong retrospective performance, but whether such systems can support clinically relevant use remains unclear. This challenge is particularly important in breast cancer, where pathological assessment serves as the gold standard for diagnosis and guides treatment planning, surgical decision-making and risk stratification across pre-, intra- and post-operative stages. Here we present \textbf{BRAVE}, a breast-adaptive pathology foundation model developed and evaluated using a total resource of 101,638 breast whole-slide images from 32 sources across Asia, Europe and North America. We assessed BRAVE across 34 tasks in 82 cohorts spanning pre-operative biopsy, intra-operative frozen section and post-operative resection, using an evidence chain comprising retrospective benchmarking, clinically challenging scenarios, workflow-oriented clinical impact simulations, prospective observational validation with the thresholds locked in the retrospective cohorts and crossover pathologist-AI interaction studies. Across these settings, BRAVE supported practical roles in the clinical workflow, including safe exclusion of low-risk cases from routine review, AI-assisted second-review rescue of initially missed positives and prioritization of cases for further assessment. In prospective validation across three centres, BRAVE excluded 76.9% of negative biopsy cases (NPV 0.953) and 70.1% of negative frozen-section cases (NPV 0.973), and triaged 78.8% of post-operative subtyping cases as high-confidence clear-cut cases (NPV 1.000). In reader studies, AI assistance improved balanced accuracy from 88.5% to 95.1% (OR 3.14, P<0.001), with better efficiency, confidence and inter-rater agreement. BRAVE-derived scores also independently predicted disease-free survival (adjusted HR 4.79, P<0.001) and overall survival (adjusted HR 8.14, P<0.001).

CLJan 8
Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning

Feihu Jin, Shipeng Cen, Ying Tan

Fine-tuning large language models (LLMs) has achieved remarkable success across various NLP tasks, but the substantial memory overhead during backpropagation remains a critical bottleneck, especially as model scales grow. Zeroth-order (ZO) optimization alleviates this issue by estimating gradients through forward passes and Gaussian sampling, avoiding the need for backpropagation. However, conventional ZO methods suffer from high variance in gradient estimation due to their reliance on random perturbations, leading to slow convergence and suboptimal performance. We propose a simple plug-and-play method that incorporates prior-informed perturbations to refine gradient estimation. Our method dynamically computes a guiding vector from Gaussian samples, which directs perturbations toward more informative directions, significantly accelerating convergence compared to standard ZO approaches. We further investigate a greedy perturbation strategy to explore the impact of prior knowledge on gradient estimation. Theoretically, we prove that our gradient estimator achieves stronger alignment with the true gradient direction, enhancing optimization efficiency. Extensive experiments across LLMs of varying scales and architectures demonstrate that our proposed method could seamlessly integrate into existing optimization methods, delivering faster convergence and superior performance. Notably, on the OPT-13B model, our method outperforms traditional ZO optimization across all 11 benchmark tasks and surpasses gradient-based baselines on 9 out of 11 tasks, establishing a robust balance between efficiency and accuracy.

LGJan 9
Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection

Feihu Jin, Ying Tan

Fine-tuning large language models (LLMs) using standard first-order (FO) optimization often drives training toward sharp, poorly generalizing minima. Conversely, zeroth-order (ZO) methods offer stronger exploratory behavior without relying on explicit gradients, yet suffer from slow convergence. More critically, our analysis reveals that in generative tasks, the vast output and search space significantly amplify estimation variance, rendering ZO methods both noisy and inefficient. To address these challenges, we propose \textbf{Hi-ZFO} (\textbf{Hi}erarchical \textbf{Z}eroth- and \textbf{F}irst-\textbf{O}rder optimization), a hybrid framework designed to synergize the precision of FO gradients with the exploratory capability of ZO estimation. Hi-ZFO adaptively partitions the model through layer-wise importance profiling, applying precise FO updates to critical layers while leveraging ZO optimization for less sensitive ones. Notably, ZO in Hi-ZFO is not merely a memory-saving surrogate; it is intentionally introduced as a source of "beneficial stochasticity" to help the model escape the local minima where pure FO optimization tends to stagnate. Validated across diverse generative, mathematical, and code reasoning tasks, Hi-ZFO consistently achieves superior performance while significantly reducing the training time. These results demonstrate the effectiveness of hierarchical hybrid optimization for LLM fine-tuning.

CLFeb 8, 2024
Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

Feihu Jin, Yifan Liu, Ying Tan

Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks and exhibited impressive reasoning abilities by applying zero-shot Chain-of-Thought (CoT) prompting. However, due to the evolving nature of sentence prefixes during the pre-training phase, existing zero-shot CoT prompting methods that employ identical CoT prompting across all task instances may not be optimal. In this paper, we introduce a novel zero-shot prompting method that leverages evolutionary algorithms to generate diverse promptings for LLMs dynamically. Our approach involves initializing two CoT promptings, performing evolutionary operations based on LLMs to create a varied set, and utilizing the LLMs to select a suitable CoT prompting for a given problem. Additionally, a rewriting operation, guided by the selected CoT prompting, enhances the understanding of the LLMs about the problem. Extensive experiments conducted across ten reasoning datasets demonstrate the superior performance of our proposed method compared to current zero-shot CoT prompting methods on GPT-3.5-turbo and GPT-4. Moreover, in-depth analytical experiments underscore the adaptability and effectiveness of our method in various reasoning tasks.

CLMar 4, 2024
Derivative-Free Optimization for Low-Rank Adaptation in Large Language Models

Feihu Jin, Yin Liu, Ying Tan

Parameter-efficient tuning methods such as LoRA could achieve comparable performance to model tuning by tuning a small portion of the parameters. However, substantial computational resources are still required, as this process involves calculating gradients and performing back-propagation throughout the model. Much effort has recently been devoted to utilizing the derivative-free optimization method to eschew the computation of gradients and showcase an augmented level of robustness in few-shot settings. In this paper, we prepend the low-rank modules into each self-attention layer of the model and employ two derivative-free optimization methods to optimize these low-rank modules at each layer alternately. Extensive results on various tasks and language models demonstrate that our proposed method achieves substantial improvement and exhibits clear advantages in memory usage and convergence speed compared to existing gradient-based parameter-efficient tuning and derivative-free optimization methods in few-shot settings.

LGSep 24, 2024
Double-Path Adaptive-correlation Spatial-Temporal Inverted Transformer for Stock Time Series Forecasting

Wenbo Yan, Ying Tan

Spatial-temporal graph neural networks (STGNNs) have achieved significant success in various time series forecasting tasks. However, due to the lack of explicit and fixed spatial relationships in stock prediction tasks, many STGNNs fail to perform effectively in this domain. While some STGNNs learn spatial relationships from time series, they often lack comprehensiveness. Research indicates that modeling time series using feature changes as tokens reveals entirely different information compared to using time steps as tokens. To more comprehensively extract dynamic spatial information from stock data, we propose a Double-Path Adaptive-correlation Spatial-Temporal Inverted Transformer (DPA-STIFormer). DPA-STIFormer models each node via continuous changes in features as tokens and introduces a Double Direction Self-adaptation Fusion mechanism. This mechanism decomposes node encoding into temporal and feature representations, simultaneously extracting different spatial correlations from a double path approach, and proposes a Double-path gating mechanism to fuse these two types of correlation information. Experiments conducted on four stock market datasets demonstrate state-of-the-art results, validating the model's superior capability in uncovering latent temporal-correlation patterns.

AINov 5, 2025
Using Multi-modal Large Language Model to Boost Fireworks Algorithm's Ability in Settling Challenging Optimization Tasks

Shipeng Cen, Ying Tan

As optimization problems grow increasingly complex and diverse, advancements in optimization techniques and paradigm innovations hold significant importance. The challenges posed by optimization problems are primarily manifested in their non-convexity, high-dimensionality, black-box nature, and other unfavorable characteristics. Traditional zero-order or first-order methods, which are often characterized by low efficiency, inaccurate gradient information, and insufficient utilization of optimization information, are ill-equipped to address these challenges effectively. In recent years, the rapid development of large language models (LLM) has led to substantial improvements in their language understanding and code generation capabilities. Consequently, the design of optimization algorithms leveraging large language models has garnered increasing attention from researchers. In this study, we choose the fireworks algorithm(FWA) as the basic optimizer and propose a novel approach to assist the design of the FWA by incorporating multi-modal large language model(MLLM). To put it simply, we propose the concept of Critical Part(CP), which extends FWA to complex high-dimensional tasks, and further utilizes the information in the optimization process with the help of the multi-modal characteristics of large language models. We focus on two specific tasks: the \textit{traveling salesman problem }(TSP) and \textit{electronic design automation problem} (EDA). The experimental results show that FWAs generated under our new framework have achieved or surpassed SOTA results on many problem instances.

LGJul 26, 2024
TCGPN: Temporal-Correlation Graph Pre-trained Network for Stock Forecasting

Wenbo Yan, Ying Tan

Recently, the incorporation of both temporal features and the correlation across time series has become an effective approach in time series prediction. Spatio-Temporal Graph Neural Networks (STGNNs) demonstrate good performance on many Temporal-correlation Forecasting Problem. However, when applied to tasks lacking periodicity, such as stock data prediction, the effectiveness and robustness of STGNNs are found to be unsatisfactory. And STGNNs are limited by memory savings so that cannot handle problems with a large number of nodes. In this paper, we propose a novel approach called the Temporal-Correlation Graph Pre-trained Network (TCGPN) to address these limitations. TCGPN utilize Temporal-correlation fusion encoder to get a mixed representation and pre-training method with carefully designed temporal and correlation pre-training tasks. Entire structure is independent of the number and order of nodes, so better results can be obtained through various data enhancements. And memory consumption during training can be significantly reduced through multiple sampling. Experiments are conducted on real stock market data sets CSI300 and CSI500 that exhibit minimal periodicity. We fine-tune a simple MLP in downstream tasks and achieve state-of-the-art results, validating the capability to capture more robust temporal correlation patterns.

LGOct 13, 2025
Schrödinger bridge for generative AI: Soft-constrained formulation and convergence analysis

Jin Ma, Ying Tan, Renyuan Xu

Generative AI can be framed as the problem of learning a model that maps simple reference measures into complex data distributions, and it has recently found a strong connection to the classical theory of the Schrödinger bridge problems (SBPs) due partly to their common nature of interpolating between prescribed marginals via entropy-regularized stochastic dynamics. However, the classical SBP enforces hard terminal constraints, which often leads to instability in practical implementations, especially in high-dimensional or data-scarce regimes. To address this challenge, we follow the idea of the so-called soft-constrained Schrödinger bridge problem (SCSBP), in which the terminal constraint is replaced by a general penalty function. This relaxation leads to a more flexible stochastic control formulation of McKean-Vlasov type. We establish the existence of optimal solutions for all penalty levels and prove that, as the penalty grows, both the controls and value functions converge to those of the classical SBP at a linear rate. Our analysis builds on Doob's h-transform representations, the stability results of Schrödinger potentials, Gamma-convergence, and a novel fixed-point argument that couples an optimization problem over the space of measures with an auxiliary entropic optimal transport problem. These results not only provide the first quantitative convergence guarantees for soft-constrained bridges but also shed light on how penalty regularization enables robust generative modeling, fine-tuning, and transfer learning.

LGSep 26, 2025
Numerion: A Multi-Hypercomplex Model for Time Series Forecasting

Hanzhong Cao, Wenbo Yan, Ying Tan

Many methods aim to enhance time series forecasting by decomposing the series through intricate model structures and prior knowledge, yet they are inevitably limited by computational complexity and the robustness of the assumptions. Our research uncovers that in the complex domain and higher-order hypercomplex spaces, the characteristic frequencies of time series naturally decrease. Leveraging this insight, we propose Numerion, a time series forecasting model based on multiple hypercomplex spaces. Specifically, grounded in theoretical support, we generalize linear layers and activation functions to hypercomplex spaces of arbitrary power-of-two dimensions and introduce a novel Real-Hypercomplex-Real Domain Multi-Layer Perceptron (RHR-MLP) architecture. Numerion utilizes multiple RHR-MLPs to map time series into hypercomplex spaces of varying dimensions, naturally decomposing and independently modeling the series, and adaptively fuses the latent patterns exhibited in different spaces through a dynamic fusion mechanism. Experiments validate the model`s performance, achieving state-of-the-art results on multiple public datasets. Visualizations and quantitative analyses comprehensively demonstrate the ability of multi-dimensional RHR-MLPs to naturally decompose time series and reveal the tendency of higher dimensional hypercomplex spaces to capture lower frequency features.

CVJun 14, 2025
MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation

Caixu Xu, Junming Wei, Huizhen Chen et al.

Recently, Mamba-based methods have become popular in medical image segmentation due to their lightweight design and long-range dependency modeling capabilities. However, current segmentation methods frequently encounter challenges in fetal ultrasound images, such as enclosed anatomical structures, blurred boundaries, and small anatomical structures. To address the need for balancing local feature extraction and global context modeling, we propose MS-UMamba, a novel hybrid convolutional-mamba model for fetal ultrasound image segmentation. Specifically, we design a visual state space block integrated with a CNN branch (SS-MCAT-SSM), which leverages Mamba's global modeling strengths and convolutional layers' local representation advantages to enhance feature learning. In addition, we also propose an efficient multi-scale feature fusion module that integrates spatial attention mechanisms, which Integrating feature information from different layers enhances the feature representation ability of the model. Finally, we conduct extensive experiments on a non-public dataset, experimental results demonstrate that MS-UMamba model has excellent performance in segmentation performance.

IVJun 10, 2025
DCD: A Semantic Segmentation Model for Fetal Ultrasound Four-Chamber View

Donglian Li, Hui Guo, Minglang Chen et al.

Accurate segmentation of anatomical structures in the apical four-chamber (A4C) view of fetal echocardiography is essential for early diagnosis and prenatal evaluation of congenital heart disease (CHD). However, precise segmentation remains challenging due to ultrasound artifacts, speckle noise, anatomical variability, and boundary ambiguity across different gestational stages. To reduce the workload of sonographers and enhance segmentation accuracy, we propose DCD, an advanced deep learning-based model for automatic segmentation of key anatomical structures in the fetal A4C view. Our model incorporates a Dense Atrous Spatial Pyramid Pooling (Dense ASPP) module, enabling superior multi-scale feature extraction, and a Convolutional Block Attention Module (CBAM) to enhance adaptive feature representation. By effectively capturing both local and global contextual information, DCD achieves precise and robust segmentation, contributing to improved prenatal cardiac assessment.

CVJun 9, 2025
FAMSeg: Fetal Femur and Cranial Ultrasound Segmentation Using Feature-Aware Attention and Mamba Enhancement

Jie He, Minglang Chen, Minying Lu et al.

Accurate ultrasound image segmentation is a prerequisite for precise biometrics and accurate assessment. Relying on manual delineation introduces significant errors and is time-consuming. However, existing segmentation models are designed based on objects in natural scenes, making them difficult to adapt to ultrasound objects with high noise and high similarity. This is particularly evident in small object segmentation, where a pronounced jagged effect occurs. Therefore, this paper proposes a fetal femur and cranial ultrasound image segmentation model based on feature perception and Mamba enhancement to address these challenges. Specifically, a longitudinal and transverse independent viewpoint scanning convolution block and a feature perception module were designed to enhance the ability to capture local detail information and improve the fusion of contextual information. Combined with the Mamba-optimized residual structure, this design suppresses the interference of raw noise and enhances local multi-dimensional scanning. The system builds global information and local feature dependencies, and is trained with a combination of different optimizers to achieve the optimal solution. After extensive experimental validation, the FAMSeg network achieved the fastest loss reduction and the best segmentation performance across images of varying sizes and orientations.

GNJun 2, 2025
GenDMR: A dynamic multimodal role-swapping network for identifying risk gene phenotypes

Lina Qin, Cheng Zhu, Chuqi Zhou et al.

Recent studies have shown that integrating multimodal data fusion techniques for imaging and genetic features is beneficial for the etiological analysis and predictive diagnosis of Alzheimer's disease (AD). However, there are several critical flaws in current deep learning methods. Firstly, there has been insufficient discussion and exploration regarding the selection and encoding of genetic information. Secondly, due to the significantly superior classification value of AD imaging features compared to genetic features, many studies in multimodal fusion emphasize the strengths of imaging features, actively mitigating the influence of weaker features, thereby diminishing the learning of the unique value of genetic features. To address this issue, this study proposes the dynamic multimodal role-swapping network (GenDMR). In GenDMR, we develop a novel approach to encode the spatial organization of single nucleotide polymorphisms (SNPs), enhancing the representation of their genomic context. Additionally, to adaptively quantify the disease risk of SNPs and brain region, we propose a multi-instance attention module to enhance model interpretability. Furthermore, we introduce a dominant modality selection module and a contrastive self-distillation module, combining them to achieve a dynamic teacher-student role exchange mechanism based on dominant and auxiliary modalities for bidirectional co-updating of different modal data. Finally, GenDMR achieves state-of-the-art performance on the ADNI public dataset and visualizes attention to different SNPs, focusing on confirming 12 potential high-risk genes related to AD, including the most classic APOE and recently highlighted significant risk genes. This demonstrates GenDMR's interpretable analytical capability in exploring AD genetic features, providing new insights and perspectives for the development of multimodal data fusion techniques.

LGApr 12, 2025
Repetitive Contrastive Learning Enhances Mamba's Selectivity in Time Series Prediction

Wenbo Yan, Hanzhong Cao, Ying Tan

Long sequence prediction is a key challenge in time series forecasting. While Mamba-based models have shown strong performance due to their sequence selection capabilities, they still struggle with insufficient focus on critical time steps and incomplete noise suppression, caused by limited selective abilities. To address this, we introduce Repetitive Contrastive Learning (RCL), a token-level contrastive pretraining framework aimed at enhancing Mamba's selective capabilities. RCL pretrains a single Mamba block to strengthen its selective abilities and then transfers these pretrained parameters to initialize Mamba blocks in various backbone models, improving their temporal prediction performance. RCL uses sequence augmentation with Gaussian noise and applies inter-sequence and intra-sequence contrastive learning to help the Mamba module prioritize information-rich time steps while ignoring noisy ones. Extensive experiments show that RCL consistently boosts the performance of backbone models, surpassing existing methods and achieving state-of-the-art results. Additionally, we propose two metrics to quantify Mamba's selective capabilities, providing theoretical, qualitative, and quantitative evidence for the improvements brought by RCL.

IVMar 26, 2025
Deep Learning-Based Quantitative Assessment of Renal Chronicity Indices in Lupus Nephritis

Tianqi Tu, Hui Wang, Jiangbo Pei et al.

Background: Renal chronicity indices (CI) have been identified as strong predictors of long-term outcomes in lupus nephritis (LN) patients. However, assessment by pathologists is hindered by challenges such as substantial time requirements, high interobserver variation, and susceptibility to fatigue. This study aims to develop an effective deep learning (DL) pipeline that automates the assessment of CI and provides valuable prognostic insights from a disease-specific perspective. Methods: We curated a dataset comprising 282 slides obtained from 141 patients across two independent cohorts with a complete 10-years follow-up. Our DL pipeline was developed on 60 slides (22,410 patch images) from 30 patients in the training cohort and evaluated on both an internal testing set (148 slides, 77,605 patch images) and an external testing set (74 slides, 27,522 patch images). Results: The study included two cohorts with slight demographic differences, particularly in age and hemoglobin levels. The DL pipeline showed high segmentation performance across tissue compartments and histopathologic lesions, outperforming state-of-the-art methods. The DL pipeline also demonstrated a strong correlation with pathologists in assessing CI, significantly improving interobserver agreement. Additionally, the DL pipeline enhanced prognostic accuracy, particularly in outcome prediction, when combined with clinical parameters and pathologist-assessed CIs Conclusions: The DL pipeline demonstrated accuracy and efficiency in assessing CI in LN, showing promise in improving interobserver agreement among pathologists. It also exhibited significant value in prognostic analysis and enhancing outcome prediction in LN patients, offering a valuable tool for clinical decision-making.

LGMar 14, 2025
Hierarchical Information-Guided Spatio-Temporal Mamba for Stock Time Series Forecasting

Wenbo Yan, Shurui Wang, Ying Tan

Mamba has demonstrated excellent performance in various time series forecasting tasks due to its superior selection mechanism. Nevertheless, conventional Mamba-based models encounter significant challenges in accurately predicting stock time series, as they fail to adequately capture both the overarching market dynamics and the intricate interdependencies among individual stocks. To overcome these constraints, we introduce the Hierarchical Information-Guided Spatio-Temporal Mamba (HIGSTM) framework. HIGSTM introduces Index-Guided Frequency Filtering Decomposition to extract commonality and specificity from time series. The model architecture features a meticulously designed hierarchical framework that systematically captures both temporal dynamic patterns and global static relationships within the stock market. Furthermore, we propose an Information-Guided Mamba that integrates macro informations into the sequence selection process, thereby facilitating more market-conscious decision-making. Comprehensive experimental evaluations conducted on the CSI500, CSI800 and CSI1000 datasets demonstrate that HIGSTM achieves state-of-the-art performance.

NCApr 25, 2021
Frequency Superposition -- A Multi-Frequency Stimulation Method in SSVEP-based BCIs

Jing Mu, David B. Grayden, Ying Tan et al.

The steady-state visual evoked potential (SSVEP) is one of the most widely used modalities in brain-computer interfaces (BCIs) due to its many advantages. However, the existence of harmonics and the limited range of responsive frequencies in SSVEP make it challenging to further expand the number of targets without sacrificing other aspects of the interface or putting additional constraints on the system. This paper introduces a novel multi-frequency stimulation method for SSVEP and investigates its potential to effectively and efficiently increase the number of targets presented. The proposed stimulation method, obtained by the superposition of the stimulation signals at different frequencies, is size-efficient, allows single-step target identification, puts no strict constraints on the usable frequency range, can be suited to self-paced BCIs, and does not require specific light sources. In addition to the stimulus frequencies and their harmonics, the evoked SSVEP waveforms include frequencies that are integer linear combinations of the stimulus frequencies. Results of decoding SSVEPs collected from nine subjects using canonical correlation analysis (CCA) with only the frequencies and harmonics as reference, also demonstrate the potential of using such a stimulation paradigm in SSVEP-based BCIs.

NCOct 27, 2020
Multi-Frequency Canonical Correlation Analysis (MFCCA): A Generalised Decoding Algorithm for Multi-Frequency SSVEP

Jing Mu, Ying Tan, David B. Grayden et al.

Stimulation methods that utilise more than one stimulation frequency have been developed for steady-state visual evoked potential (SSVEP) brain-computer interfaces (BCIs) with the purpose of increasing the number of targets that can be presented simultaneously. However, there is no unified decoding algorithm that can be used without training for each individual users or cases, and applied to a large class of multi-frequency stimulated SSVEP settings. This paper extends the widely used canonical correlation analysis (CCA) decoder to explicitly accommodate multi-frequency SSVEP by exploiting the interactions between the multiple stimulation frequencies. A concept of order, defined as the sum of absolute value of the coefficients in the linear combination of the input frequencies, was introduced to assist the design of Multi-Frequency CCA (MFCCA). The probability distribution of the order in the resulting SSVEP response was then used to improve decoding accuracy. Results show that, compared to the standard CCA formulation, the proposed MFCCA has a 20% improvement in decoding accuracy on average at order 2, while keeping its generality and training-free characteristics.

ROMar 2, 2020
The Use of Implicit Human Motor Behaviour in the Online Personalisation of Prosthetic Interfaces

Ricardo Garcia-Rosas, Tianshi Yu, Denny Oetomo et al.

In previous work, the authors proposed a data-driven optimisation algorithm for the personalisation of human-prosthetic interfaces, demonstrating the possibility of adapting prosthesis behaviour to its user while the user performs tasks with it. This method requires that the human and the prosthesis personalisation algorithm have same pre-defined objective function. This was previously ensured by providing the human with explicit feedback on what the objective function is. However, constantly displaying this information to the prosthesis user is impractical. Moreover, the method utilised task information in the objective function which may not be available from the wearable sensors typically used in prosthetic applications. In this work, the previous approach is extended to use a prosthesis objective function based on implicit human motor behaviour, which represents able-bodied human motor control and is measureable using wearable sensors. The approach is tested in a hardware implementation of the personalisation algorithm on a prosthetic elbow, where the prosthetic objective function is a function of upper-body compensation, and is measured using wearable IMUs. Experimental results on able-bodied subjects using a supernumerary prosthetic elbow mounted on an elbow orthosis suggest that it is possible to use a prosthesis objective function which is implicit in human behaviour to achieve collaboration without providing explicit feedback to the human, motivating further studies.

ROFeb 19, 2020
Task-space Synergies for Reaching using Upper-limb Prostheses

Ricardo Garcia-Rosas, Denny Oetomo, Chris Manzie et al.

Synergistic prostheses enable the coordinated movement of the human-prosthetic arm, as required by activities of daily living. This is achieved by coupling the motion of the prosthesis to the human command, such as the residual limb movement in motion-based interfaces. Previous studies demonstrated that developing human-prosthetic synergies in joint-space must consider individual motor behaviour and the intended task to be performed, requiring personalisation and task calibration. In this work, an alternative synergy-based strategy, utilising a synergistic relationship expressed in task-space, is proposed. This task-space synergy has the potential to replace the need for personalisation and task calibration with a model-based approach requiring knowledge of the individual user's arm kinematics, the anticipated hand motion during the task and voluntary information from the prosthetic user. The proposed method is compared with surface electromyography-based and joint-space synergy-based prosthetic interfaces in a study of motor behaviour and task performance on able-bodied subjects using a VR-based transhumeral prosthesis. Experimental results showed that for a set of forward reaching tasks the proposed task-space synergy achieves comparable performance to joint-space synergies without the need to rely on time-consuming calibration processes or human motor learning. Case study results with an amputee subject motivate the further development of the proposed task-space synergy method.

ROFeb 19, 2019
Personalized On-line Adaptation of Kinematic Synergies for Human-Prosthesis Interfaces

Ricardo Garcia-Rosas, Ying Tan, Denny Oetomo et al.

Synergies have been adopted in prosthetic limb applications to reduce complexity of design, but typically involve a single synergy setting for a population and ignore individual preference or adaptation capacity. However, personalization of the synergy setting is necessary for the effective operation of the prosthetic device. Two major challenges hinder the personalization of synergies in human-prosthesis interfaces. The first is related to the process of human motor adaptation and the second to the variation in motor learning dynamics of individuals. In this paper, a systematic personalization of kinematic synergies for human-prosthesis interfaces using on-line measurements from each individual is proposed. The task of reaching using the upper-limb is described by an objective function and the interface is parameterized by a kinematic synergy. Consequently, personalizing the interface for a given individual can be formulated as finding an optimal personalized parameter. A structure to model the observed motor behavior that allows for the personalized traits of motor preference and motor learning is proposed, and subsequently used in an on-line optimization scheme to identify the synergies for an individual. The knowledge of the common features contained in the model enables on-line adaptation of the human-prosthesis interface to happen concurrently to human motor adaptation without the need to re-tune the personalization algorithm for each individual. Human-in-the-loop experimental results with able-bodied subjects, performed in a virtual reality environment to emulate amputation and prosthesis use, show that the proposed personalization algorithm was effective in obtaining optimal synergies with a fast uniform convergence speed across a group of individuals.

BMFeb 9, 2019
Clustering Bioactive Molecules in 3D Chemical Space with Unsupervised Deep Learning

Chu Qin, Ying Tan, Shang Ying Chen et al.

Unsupervised clustering has broad applications in data stratification, pattern investigation and new discovery beyond existing knowledge. In particular, clustering of bioactive molecules facilitates chemical space mapping, structure-activity studies, and drug discovery. These tasks, conventionally conducted by similarity-based methods, are complicated by data complexity and diversity. We ex-plored the superior learning capability of deep autoencoders for unsupervised clustering of 1.39 mil-lion bioactive molecules into band-clusters in a 3-dimensional latent chemical space. These band-clusters, displayed by a space-navigation simulation software, band molecules of selected bioactivity classes into individual band-clusters possessing unique sets of common sub-structural features beyond structural similarity. These sub-structural features form the frameworks of the literature-reported pharmacophores and privileged fragments. Within each band-cluster, molecules are further banded into selected sub-regions with respect to their bioactivity target, sub-structural features and molecular scaffolds. Our method is potentially applicable for big data clustering tasks of different fields.

ROJul 31, 2018
Accurate indoor mapping using an autonomous unmanned aerial vehicle (UAV)

Lachlan Dowling, Tomas Poblete, Isaac Hook et al.

An autonomous indoor aerial vehicle requires reliable simul- taneous localization and mapping (SLAM), accurate flight control, and robust path planning for navigation. This paper presents a system level combination of these existing technologies for 2D navigation. An Unmanned aerial vehicle (UAV) called URSA (Unmanned Recon and Safety Aircraft) that can autonomously flight and mapping indoors environments with an accuracy of 2 cm was developed. Performance in indoor environments was assessed in terms of mapping and navigation precision.

LGMay 23, 2017
Black-Box Attacks against RNN based Malware Detection Algorithms

Weiwei Hu, Ying Tan

Recent researches have shown that machine learning based malware detection algorithms are very vulnerable under the attacks of adversarial examples. These works mainly focused on the detection algorithms which use features with fixed dimension, while some researchers have begun to use recurrent neural networks (RNN) to detect malware based on sequential API features. This paper proposes a novel algorithm to generate sequential adversarial examples, which are used to attack a RNN based malware detection system. It is usually hard for malicious attackers to know the exact structures and weights of the victim RNN. A substitute RNN is trained to approximate the victim RNN. Then we propose a generative RNN to output sequential adversarial examples from the original sequential malware inputs. Experimental results showed that RNN based malware detection algorithms fail to detect most of the generated malicious adversarial examples, which means the proposed model is able to effectively bypass the detection algorithms.

LGFeb 20, 2017
Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN

Weiwei Hu, Ying Tan

Machine learning has been used to detect new malware in recent years, while malware authors have strong motivation to attack such algorithms. Malware authors usually have no access to the detailed structures and parameters of the machine learning models used by malware detection systems, and therefore they can only perform black-box attacks. This paper proposes a generative adversarial network (GAN) based algorithm named MalGAN to generate adversarial malware examples, which are able to bypass black-box machine learning based detection models. MalGAN uses a substitute detector to fit the black-box malware detection system. A generative network is trained to minimize the generated adversarial examples' malicious probabilities predicted by the substitute detector. The superiority of MalGAN over traditional gradient based adversarial example generation algorithms is that MalGAN is able to decrease the detection rate to nearly zero and make the retraining based defensive method against adversarial examples hard to work.

NEApr 6, 2016
Information Utilization Ratio in Heuristic Optimization Algorithms

Junzhi Li, Ying Tan

Heuristic algorithms are able to optimize objective functions efficiently because they use intelligently the information about the objective functions. Thus, information utilization is critical to the performance of heuristics. However, the concept of information utilization has remained vague and abstract because there is no reliable metric to reflect the extent to which the information about the objective function is utilized by heuristic algorithms. In this paper, the metric of information utilization ratio (IUR) is defined, which is the ratio of the utilized information quantity over the acquired information quantity in the search process. The IUR proves to be well-defined. Several examples of typical heuristic algorithms are given to demonstrate the procedure of calculating the IUR. Empirical evidences on the correlation between the IUR and the performance of a heuristic are also provided. The IUR can be an index of how finely an algorithm is designed and guide the invention of new heuristics and the improvement of existing ones.

CLMar 8, 2016
Variational Autoencoders for Semi-supervised Text Classification

Weidi Xu, Haoze Sun, Chao Deng et al.

Although semi-supervised variational autoencoder (SemiVAE) works in image classification task, it fails in text classification task if using vanilla LSTM as its decoder. From a perspective of reinforcement learning, it is verified that the decoder's capability to distinguish between different categorical labels is essential. Therefore, Semi-supervised Sequential Variational Autoencoder (SSVAE) is proposed, which increases the capability by feeding label into its decoder RNN at each time-step. Two specific decoder structures are investigated and both of them are verified to be effective. Besides, in order to reduce the computational complexity in training, a novel optimization method is proposed, which estimates the gradient of the unlabeled objective function by sampling, along with two variance reduction techniques. Experimental results on Large Movie Review Dataset (IMDB) and AG's News corpus show that the proposed approach significantly improves the classification accuracy compared with pure-supervised classifiers, and achieves competitive performance against previous advanced methods. State-of-the-art results can be obtained by integrating other pretraining-based methods.

NEMay 1, 2015
A Cooperative Framework for Fireworks Algorithm

Shaoqiu Zheng, Junzhi Li, Andreas Janecek et al.

This paper presents a cooperative framework for fireworks algorithm (CoFFWA). A detailed analysis of existing fireworks algorithm (FWA) and its recently developed variants has revealed that (i) the selection strategy lead to the contribution of the firework with the best fitness (core firework) for the optimization overwhelms the contributions of the rest of fireworks (non-core fireworks) in the explosion operator, (ii) the Gaussian mutation operator is not as effective as it is designed to be. To overcome these limitations, the CoFFWA is proposed, which can greatly enhance the exploitation ability of non-core fireworks by using independent selection operator and increase the exploration capacity by crowdness-avoiding cooperative strategy among the fireworks. Experimental results on the CEC2013 benchmark functions suggest that CoFFWA outperforms the state-of-the-art FWA variants, artificial bee colony, differential evolution, the standard particle swarm optimization (SPSO) in 2007 and the most recent SPSO in 2011 in term of convergence performance.

NEJul 29, 2014
A CUDA-Based Real Parameter Optimization Benchmark

Ke Ding, Ying Tan

Benchmarking is key for developing and comparing optimization algorithms. In this paper, a CUDA-based real parameter optimization benchmark (cuROB) is introduced. Test functions of diverse properties are included within cuROB and implemented efficiently with CUDA. Speedup of one order of magnitude can be achieved in comparison with CPU-based benchmark of CEC'14.