IVMar 14, 2023
Learning Homeomorphic Image Registration via Conformal-Invariant Hyperelastic RegularisationJing Zou, Noémie Debroux, Lihao Liu et al.
Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving transformations. This is a key property to preserve anatomical structures and achieve plausible transformations that can be used in real clinical settings. We propose a novel framework for deformable image registration. Firstly, we introduce a novel regulariser based on conformal-invariant properties in a nonlinear elasticity setting. Our regulariser enforces the deformation field to be smooth, invertible and orientation-preserving. More importantly, we strictly guarantee topology preservation yielding to a clinical meaningful registration. Secondly, we boost the performance of our regulariser through coordinate MLPs, where one can view the to-be-registered images as continuously differentiable entities. We demonstrate, through numerical and visual experiments, that our framework is able to outperform current techniques for image registration.
LGFeb 20, 2024Code
Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations ModelingGuoqi Yu, Jing Zou, Xiaowei Hu et al.
Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introduce a learnable decomposition strategy to capture dynamic trend information more reasonably. Additionally, we propose a dual attention module tailored to capture inter-series dependencies and intra-series variations simultaneously for better time series forecasting, which is implemented by channel-wise self-attention and autoregressive self-attention. To evaluate the effectiveness of our method, we conducted experiments across eight open-source datasets and compared it with the state-of-the-art methods. Through the comparison results, our Leddam (LEarnable Decomposition and Dual Attention Module) not only demonstrates significant advancements in predictive performance, but also the proposed decomposition strategy can be plugged into other methods with a large performance-boosting, from 11.87% to 48.56% MSE error degradation.
AIMay 17, 2025Code
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error CorrectionJing Zou, Qingqiu Li, Chenyu Lian et al.
AI-driven models have shown great promise in detecting errors in radiology reports, yet the field lacks a unified benchmark for rigorous evaluation of error detection and further correction. To address this gap, we introduce CorBenchX, a comprehensive suite for automated error detection and correction in chest X-ray reports, designed to advance AI-assisted quality control in clinical practice. We first synthesize a large-scale dataset of 26,326 chest X-ray error reports by injecting clinically common errors via prompting DeepSeek-R1, with each corrupted report paired with its original text, error type, and human-readable description. Leveraging this dataset, we benchmark both open- and closed-source vision-language models,(e.g., InternVL, Qwen-VL, GPT-4o, o4-mini, and Claude-3.7) for error detection and correction under zero-shot prompting. Among these models, o4-mini achieves the best performance, with 50.6 % detection accuracy and correction scores of BLEU 0.853, ROUGE 0.924, BERTScore 0.981, SembScore 0.865, and CheXbertF1 0.954, remaining below clinical-level accuracy, highlighting the challenge of precise report correction. To advance the state of the art, we propose a multi-step reinforcement learning (MSRL) framework that optimizes a multi-objective reward combining format compliance, error-type accuracy, and BLEU similarity. We apply MSRL to QwenVL2.5-7B, the top open-source model in our benchmark, achieving an improvement of 38.3% in single-error detection precision and 5.2% in single-error correction over the zero-shot baseline.
LGSep 15, 2025
DARD: Dice Adversarial Robustness Distillation against Adversarial AttacksJing Zou, Shungeng Zhang, Meikang Qiu et al.
Deep learning models are vulnerable to adversarial examples, posing critical security challenges in real-world applications. While Adversarial Training (AT ) is a widely adopted defense mechanism to enhance robustness, it often incurs a trade-off by degrading performance on unperturbed, natural data. Recent efforts have highlighted that larger models exhibit enhanced robustness over their smaller counterparts. In this paper, we empirically demonstrate that such robustness can be systematically distilled from large teacher models into compact student models. To achieve better performance, we introduce Dice Adversarial Robustness Distillation (DARD), a novel method designed to transfer robustness through a tailored knowledge distillation paradigm. Additionally, we propose Dice Projected Gradient Descent (DPGD), an adversarial example generalization method optimized for effective attack. Our extensive experiments demonstrate that the DARD approach consistently outperforms adversarially trained networks with the same architecture, achieving superior robustness and standard accuracy.
IVJun 27, 2024
MMR-Mamba: Multi-Modal MRI Reconstruction with Mamba and Spatial-Frequency Information FusionJing Zou, Lanqing Liu, Qi Chen et al.
Multi-modal MRI offers valuable complementary information for diagnosis and treatment; however, its utility is limited by prolonged scanning times. To accelerate the acquisition process, a practical approach is to reconstruct images of the target modality, which requires longer scanning times, from under-sampled k-space data using the fully-sampled reference modality with shorter scanning times as guidance. The primary challenge of this task is comprehensively and efficiently integrating complementary information from different modalities to achieve high-quality reconstruction. Existing methods struggle with this: 1) convolution-based models fail to capture long-range dependencies; 2) transformer-based models, while excelling in global feature modeling, struggle with quadratic computational complexity. To address this, we propose MMR-Mamba, a novel framework that thoroughly and efficiently integrates multi-modal features for MRI reconstruction, leveraging Mamba's capability to capture long-range dependencies with linear computational complexity while exploiting global properties of the Fourier domain. Specifically, we first design a Target modality-guided Cross Mamba (TCM) module in the spatial domain, which maximally restores the target modality information by selectively incorporating relevant information from the reference modality. Then, we introduce a Selective Frequency Fusion (SFF) module to efficiently integrate global information in the Fourier domain and recover high-frequency signals for the reconstruction of structural details. Furthermore, we devise an Adaptive Spatial-Frequency Fusion (ASFF) module, which mutually enhances the spatial and frequency domains by supplementing less informative channels from one domain with corresponding channels from the other.
NAJun 10, 2005
Theoretical and Experimental Analysis of a Randomized Algorithm for Sparse Fourier Transform AnalysisJing Zou, Anna Gilbert, Martin Strauss et al.
We analyze a sublinear RAlSFA (Randomized Algorithm for Sparse Fourier Analysis) that finds a near-optimal B-term Sparse Representation R for a given discrete signal S of length N, in time and space poly(B,log(N)), following the approach given in \cite{GGIMS}. Its time cost poly(log(N)) should be compared with the superlinear O(N log N) time requirement of the Fast Fourier Transform (FFT). A straightforward implementation of the RAlSFA, as presented in the theoretical paper \cite{GGIMS}, turns out to be very slow in practice. Our main result is a greatly improved and practical RAlSFA. We introduce several new ideas and techniques that speed up the algorithm. Both rigorous and heuristic arguments for parameter choices are presented. Our RAlSFA constructs, with probability at least 1-delta, a near-optimal B-term representation R in time poly(B)log(N)log(1/delta)/ epsilon^{2} log(M) such that ||S-R||^{2}<=(1+epsilon)||S-R_{opt}||^{2}. Furthermore, this RAlSFA implementation already beats the FFTW for not unreasonably large N. We extend the algorithm to higher dimensional cases both theoretically and numerically. The crossover point lies at N=70000 in one dimension, and at N=900 for data on a N*N grid in two dimensions for small B signals where there is noise.
NAFeb 16, 2005
A Sublinear Algorithm of Sparse Fourier Transform for Nonequispaced DataJing Zou
We present a sublinear randomized algorithm to compute a sparse Fourier transform for nonequispaced data. Suppose a signal S is known to consist of N equispaced samples, of which only L<N are available. If the ratio p=L/N is not close to 1, the available data are typically non-equispaced samples. Then our algorithm reconstructs a near-optimal B-term representation R with high probability 1-delta, in time and space poly(B,log(L),log p, log(1/delta), epsilon^{-1}, such that ||S-R||^2 < (1+epsilon) ||S-R_{opt}^B||^2, where R_{opt}^B is the optimal B-term Fourier representation of signal S. The sublinear poly(logL) time is compared to the superlinear O(Nlog N+L) time requirement of the present best known Inverse Nonequispaced Fast Fourier Transform (INFFT) algorithms. Numerical experiments support the advantage in speed of our algorithm over other methods for sparse signals: it already outperforms INFFT for large but realistic size N and works well even in the situation of a large percentage of missing data and in the presence of noise.