Chong Wu

CV
h-index21
8papers
76citations
Novelty53%
AI Score44

8 Papers

CVSep 27, 2023
Physics Inspired Hybrid Attention for SAR Target Recognition

Zhongling Huang, Chong Wu, Xiwen Yao et al.

There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly depends on the ASC optimization result, and the fusion strategy is not adaptable to different types of physical information. Meanwhile, the current evaluation scheme is inadequate to assess the model's robustness and generalizability. Thus, we propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the above issues. PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target, so as to re-weight the feature importance based on knowledge prior. It is flexible and generally applicable to various physical models, and can be integrated into arbitrary DNNs without modifying the original architecture. The experiments involve a rigorous assessment using the proposed OFA, which entails training and validating a model on either sufficient or limited data and evaluating on multiple test sets with different data distributions. Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters. Moreover, we analyze the working mechanism of PIHA and evaluate various PIHA enabled DNNs. The experiments also show PIHA is effective for different physical information. The source code together with the adopted physical information is available at https://github.com/XAI4SAR.

CLNov 1, 2022
Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation

Jiangbin Zheng, Siyuan Li, Cheng Tan et al.

Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand. In recent years, neural Sign Language Translation (SLT), as a possible way for bridging communication gap between the deaf and the hearing people, has attracted widespread academic attention. We found that the current mainstream end-to-end neural SLT models, which tries to learning language knowledge in a weakly supervised manner, could not mine enough semantic information under the condition of low data resources. Therefore, we propose to introduce additional word-level semantic knowledge of sign language linguistics to assist in improving current end-to-end neural SLT models. Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models. To the best of our knowledge, we are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models. Moreover, we conducted experiments on a publicly available popular SLT dataset RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our method can improve the model.

GNOct 12, 2023
Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

Chen Zhao, Kuan-Jui Su, Chong Wu et al.

Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R^2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.

NEMar 19, 2022
The Deep Learning model of Higher-Lower-Order Cognition, Memory, and Affection- More General Than KAN

Jun-Bo Tao, Bai-Qing Sun, Wei-Dong Zhu et al.

We firstly simulated disease dynamics by KAN (Kolmogorov-Arnold Networks) nearly 4 years ago, but the kernel functions in the edge include the exponential number of infected and discharged people and is also in line with the Kolmogorov-Arnold representation theorem, and the shared weights in the edge are the infection rate and cure rate, and used activation function by tanh at the node of edge. And this Arxiv preprint version 1 of March 2022 is an upgraded version of KAN, considering the invariant coarse-grained which calculated by residual or gradient of MSE loss. The improved KAN is PNN (Plasticity Neural Networks) or ELKAN (Edge Learning KNN), in addition to edge learning, it also considered the trimming of the edge. We not inspired by the Kolmogorov-Arnold representation theorem but inspired by the brain science. The ELKAN to explain brain, the variables correspond to different types of neurons, the learning edge can be explained by rebalance of synaptic strength and glial cells phagocytose synapses, and the kernel function means the discharge of neurons and synapses, different neurons and edges mean brain regions. Through testing by cosine, the ELKAN or ORPNN (Optimized Range PNN) is better than the KAN or CRPNN (Constant Range PNN).The ELKAN is more general to explore brain, such as mechanism of consciousness, interactions of natural frequencies in brain regions, synaptic and neuronal discharge frequencies, and data signal frequencies; mechanism of Alzheimer's disease, the Alzheimer's patients has more high frequencies in the upstream brain regions; long short-term relatively good and inferior memory which means gradient of architecture and architecture; turbulent energy flow in different brain regions, turbulence critical conditions need to be met; heart-brain of the quantum entanglement may occur between the emotions of heartbeat and the synaptic strength of brain potentials.

CVJun 4, 2025
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought

Yi Lu, Jiawang Cao, Yongliang Wu et al. · utoronto

Multi-modal Large Language Models (MLLMs) have demonstrated remarkable reasoning capability while lack explicit mechanisms for visual grounding and segmentation, creating a gap between cognitive reasoning and visual perception. To bridge this gap, we introduce Reasoning Segmentation via Visual Prompting (RSVP), a novel framework that unifies multi-step multimodal reasoning with grounded visual understanding. RSVP is a two-stage structuralized framework that integrates reasoning-driven localization with segmentation refinement. In the reasoning stage, RSVP employs multimodal chain-of-thought visual prompts to help MLLMs understand queries and infer targets, generating interpretable region proposals that enhance visual grounding. In segmentation stage, RSVP refines these proposals with a Vision-Language Segmentation Module (VLSM), seamlessly integrates textual and visual cues to produce precise segmentation masks. By explicitly modelling the interaction between multimodal reasoning and segmentation, RSVP introduces a new paradigm for interpretable reasoning segmentation. It exploits MLLMs' inherent localization capabilities, enabling the models to not only reason about objects but also generate structured visual representations. Our extensive experiments demonstrate that RSVP achieves state-of-the-art performance, surpasses state-of-the-art methods by up to +6.5 gIoU and +9.2 cIoU on ReasonSeg, and achieves 49.7 mAP on SegInW under zero-shot settings. These results validate RSVP as an effective and scalable framework for integrating cognitive reasoning with structured visual understanding.

NAJun 10, 2025
sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation

Renjie Xu, Chong Wu, Maolin Che et al.

We propose sparseGeoHOPCA, a novel framework for sparse higher-order principal component analysis (SHOPCA) that introduces a geometric perspective to high-dimensional tensor decomposition. By unfolding the input tensor along each mode and reformulating the resulting subproblems as structured binary linear optimization problems, our method transforms the original nonconvex sparse objective into a tractable geometric form. This eliminates the need for explicit covariance estimation and iterative deflation, enabling significant gains in both computational efficiency and interpretability, particularly in high-dimensional and unbalanced data scenarios. We theoretically establish the equivalence between the geometric subproblems and the original SHOPCA formulation, and derive worst-case approximation error bounds based on classical PCA residuals, providing data-dependent performance guarantees. The proposed algorithm achieves a total computational complexity of $O\left(\sum_{n=1}^{N} (k_n^3 + J_n k_n^2)\right)$, which scales linearly with tensor size. Extensive experiments demonstrate that sparseGeoHOPCA accurately recovers sparse supports in synthetic settings, preserves classification performance under 10$\times$ compression, and achieves high-quality image reconstruction on ImageNet, highlighting its robustness and versatility.

LGSep 29, 2025
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in Large Language Model Routing

Yichi Zhang, Fangzheng Xie, Shu Yang et al.

In language tasks that require extensive human--model interaction, deploying a single "best" model for every query can be expensive. To reduce inference cost while preserving the quality of the responses, a large language model (LLM) router selects the most appropriate model from a pool of candidates for each query. A central challenge to training a high-quality router is the scarcity of reliable supervision. Gold-standard data (e.g., expert-verified labels or rubric-based scores) provide accurate quality evaluations of LLM responses but are costly and difficult to scale. In contrast, preference-based data, collected via crowdsourcing or LLM-as-a-judge systems, are cheaper and more scalable, yet often biased in reflecting the true quality of responses. We cast the problem of LLM router training with combined gold-standard and preference-based data into a causal inference framework by viewing the response evaluation mechanism as the treatment assignment. This perspective further reveals that the bias in preference-based data corresponds to the well-known causal estimand: the conditional average treatment effect. Based on this new perspective, we develop an integrative causal router training framework that corrects preference-data bias, address imbalances between two data sources, and improve routing robustness and efficiency. Numerical experiments demonstrate that our approach delivers more accurate routing and improves the trade-off between cost and quality.

CVOct 18, 2019
A novel centroid update approach for clustering-based superpixel methods and superpixel-based edge detection

Houwang Zhang, Chong Wu, Le Zhang et al.

Superpixel is widely used in image processing. And among the methods for superpixel generation, clustering-based methods have a high speed and a good performance at the same time. However, most clustering-based superpixel methods are sensitive to noise. To solve these problems, in this paper, we first analyze the features of noise. Then according to the statistical features of noise, we propose a novel centroid update approach to enhance the robustness of clustering-based superpixel methods. Besides, we propose a novel superpixel-based edge detection method. The experiments on BSD500 dataset show that our approach can significantly enhance the performance of clustering-based superpixel methods in noisy environment. Moreover, we also show that our proposed edge detection method outperforms other classical methods.