Raymond H. Chan

CV
h-index21
26papers
186citations
Novelty54%
AI Score56

26 Papers

NANov 10, 2015
Restoration of Manifold-Valued Images by Half-Quadratic Minimization

Ronny Bergmann, Raymond H. Chan, Ralf Hielscher et al.

The paper addresses the generalization of the half-quadratic minimization method for the restoration of images having values in a complete Riemannian manifold. We recall the half-quadratic minimization method using the notation of the c-transform and adapt the algorithm to our special variational setting. We prove the convergence of the method for Hadamard spaces. Extensive numerical examples for images with values on spheres, in the rotation group SO(3) and in the manifold of positive definite matrices demonstrate the excellent performance of the algorithm. In particular, the method with SO(3)-valued data shows promising results for the restoration of images obtained from Electron Backscattered Diffraction which are of interest in material science.

CVMar 29, 2022
Classification of Hyperspectral Images Using SVM with Shape-adaptive Reconstruction and Smoothed Total Variation

Ruoning Li, Kangning Cui, Raymond H. Chan et al.

In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vector Machines (SVMs) are trained to estimate the pixel-wise probability maps of each class. Then the Smoothed Total Variation (STV) model is applied to denoise and generate the final classification map. Experiments show that SaR-SVM-STV outperforms the SVM-STV method with a few training labels, demonstrating the significance of reconstructing hyperspectral images before classification.

LGFeb 14, 2023
Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

Dong Li, Shuisheng Zhou, Tieyong Zeng et al.

K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters k has to be given a priori. To solve these two issues, a multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is presented. First, based on the structure of the spurious local minima of the K-Means problem, a multi-prototypes sampling (MPS) is designed to select the appropriate number of multi-prototypes for data with arbitrary shapes. A theoretical proof is given to guarantee that the multi-prototypes selected by MPS can achieve a constant factor approximation to the optimal cost of the K-Means problem. Then, a merging technique, called convex merging (CM), merges the multi-prototypes to get a better local minima without k being given a priori. Specifically, CM can obtain the optimal merging and estimate the correct k. By integrating these two techniques with K-Means algorithm, the proposed MCKM is an efficient and explainable clustering algorithm for escaping the undesirable local minima of K-Means problem without given k first. Experimental results performed on synthetic and real-world data sets have verified the effectiveness of the proposed algorithm.

CVApr 20, 2022
A 3-stage Spectral-spatial Method for Hyperspectral Image Classification

Raymond H. Chan, Ruoning Li

Hyperspectral images often have hundreds of spectral bands of different wavelengths captured by aircraft or satellites that record land coverage. Identifying detailed classes of pixels becomes feasible due to the enhancement in spectral and spatial resolution of hyperspectral images. In this work, we propose a novel framework that utilizes both spatial and spectral information for classifying pixels in hyperspectral images. The method consists of three stages. In the first stage, the pre-processing stage, Nested Sliding Window algorithm is used to reconstruct the original data by {enhancing the consistency of neighboring pixels} and then Principal Component Analysis is used to reduce the dimension of data. In the second stage, Support Vector Machines are trained to estimate the pixel-wise probability map of each class using the spectral information from the images. Finally, a smoothed total variation model is applied to smooth the class probability vectors by {ensuring spatial connectivity} in the images. We demonstrate the superiority of our method against three state-of-the-art algorithms on six benchmark hyperspectral data sets with 10 to 50 training labels for each class. The results show that our method gives the overall best performance in accuracy. Especially, our gain in accuracy increases when the number of labeled pixels decreases and therefore our method is more advantageous to be applied to problems with small training set. Hence it is of great practical significance since expert annotations are often expensive and difficult to collect.

CVJun 19, 2022
Semi-supervised Change Detection of Small Water Bodies Using RGB and Multispectral Images in Peruvian Rainforests

Kangning Cui, Seda Camalan, Ruoning Li et al.

Artisanal and Small-scale Gold Mining (ASGM) is an important source of income for many households, but it can have large social and environmental effects, especially in rainforests of developing countries. The Sentinel-2 satellites collect multispectral images that can be used for the purpose of detecting changes in water extent and quality which indicates the locations of mining sites. This work focuses on the recognition of ASGM activities in Peruvian Amazon rainforests. We tested several semi-supervised classifiers based on Support Vector Machines (SVMs) to detect the changes of water bodies from 2019 to 2021 in the Madre de Dios region, which is one of the global hotspots of ASGM activities. Experiments show that SVM-based models can achieve reasonable performance for both RGB (using Cohen's $κ$ 0.49) and 6-channel images (using Cohen's $κ$ 0.71) with very limited annotations. The efficacy of incorporating Lab color space for change detection is analyzed as well.

CVApr 28, 2022
Unsupervised Spatial-spectral Hyperspectral Image Reconstruction and Clustering with Diffusion Geometry

Kangning Cui, Ruoning Li, Sam L. Polk et al.

Hyperspectral images, which store a hundred or more spectral bands of reflectance, have become an important data source in natural and social sciences. Hyperspectral images are often generated in large quantities at a relatively coarse spatial resolution. As such, unsupervised machine learning algorithms incorporating known structure in hyperspectral imagery are needed to analyze these images automatically. This work introduces the Spatial-Spectral Image Reconstruction and Clustering with Diffusion Geometry (DSIRC) algorithm for partitioning highly mixed hyperspectral images. DSIRC reduces measurement noise through a shape-adaptive reconstruction procedure. In particular, for each pixel, DSIRC locates spectrally correlated pixels within a data-adaptive spatial neighborhood and reconstructs that pixel's spectral signature using those of its neighbors. DSIRC then locates high-density, high-purity pixels far in diffusion distance (a data-dependent distance metric) from other high-density, high-purity pixels and treats these as cluster exemplars, giving each a unique label. Non-modal pixels are assigned the label of their diffusion distance-nearest neighbor of higher density and purity that is already labeled. Strong numerical results indicate that incorporating spatial information through image reconstruction substantially improves the performance of pixel-wise clustering.

IVFeb 22, 2023Code
A Global and Patch-wise Contrastive Loss for Accurate Automated Exudate Detection

Wei Tang, Kangning Cui, Raymond H. Chan

Diabetic retinopathy (DR) is a leading global cause of blindness. Early detection of hard exudates plays a crucial role in identifying DR, which aids in treating diabetes and preventing vision loss. However, the unique characteristics of hard exudates, ranging from their inconsistent shapes to indistinct boundaries, pose significant challenges to existing segmentation techniques. To address these issues, we present a novel supervised contrastive learning framework to optimize hard exudate segmentation. Specifically, we introduce a patch-wise density contrasting scheme to distinguish between areas with varying lesion concentrations, and therefore improve the model's proficiency in segmenting small lesions. To handle the ambiguous boundaries, we develop a discriminative edge inspection module to dynamically analyze the pixels that lie around the boundaries and accurately delineate the exudates. Upon evaluation using the IDRiD dataset and comparison with state-of-the-art frameworks, our method exhibits its effectiveness and shows potential for computer-assisted hard exudate detection. The code to replicate experiments is available at github.com/wetang7/HECL/.

CVNov 21, 2023
TRIDENT: The Nonlinear Trilogy for Implicit Neural Representations

Zhenda Shen, Yanqi Cheng, Raymond H. Chan et al.

Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secondly, TRIDENT efficiently captures frequency information, a feature called frequency compactness. Thirdly, it has the capability to represent signals or images such that most of its energy is concentrated in a limited spatial region, denoting spatial compactness. We demonstrated through extensive experiments on various inverse problems that our proposed function outperforms existing implicit neural representation functions.

CVNov 22, 2023
Single-Shot Plug-and-Play Methods for Inverse Problems

Yanqi Cheng, Lipei Zhang, Zhenda Shen et al.

The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on pre-trained denoisers using large datasets. In this work, we introduce Single-Shot PnP methods (SS-PnP), shifting the focus to solving inverse problems with minimal data. First, we integrate Single-Shot proximal denoisers into iterative methods, enabling training with single instances. Second, we propose implicit neural priors based on a novel function that preserves relevant frequencies to capture fine details while avoiding the issue of vanishing gradients. We demonstrate, through extensive numerical and visual experiments, that our method leads to better approximations.

65.3LGMay 16
Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

Wei Tang, Jinpei Han, Kangning Cui et al.

Electrocardiogram (ECG) foundation models pretrained on typical diagnostic 10-second ECG segments, have demonstrated strong transferability across a range of clinical applications. However, many real-world applications produce recordings that are typically longer, and are varied in duration during inference time. These 10-second models have no built-in way to combine information across time. Extending them to longer horizons introduces two challenges: structural incompatibilities arising from input-length disparities, and semantic challenges that limit meaningful temporal aggregation. We propose a parameter-efficient framework that extends pretrained ECG foundation models to longer and variable-length ECGs without retraining the backbone. Guided by a frozen pretrained 10-second model, we introduce a lightweight plug-in module that extends the model in two complementary ways: (i) structurally compatible long-sequence processing and (ii) semantically informed temporal modeling. Experiments on multiple long-horizon ECG tasks, datasets, and foundation model backbones demonstrate that our method enables robust long-horizon extension from pretrained snapshot models, consistently outperforming sliding-window and pooling-based baselines with strong parameter efficiency.

CVJul 22, 2025Code
PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation

Yaofang Liu, Yumeng Ren, Aitor Artola et al.

The rapid advancement of video diffusion models has been hindered by fundamental limitations in temporal modeling, particularly the rigid synchronization of frame evolution imposed by conventional scalar timestep variables. While task-specific adaptations and autoregressive models have sought to address these challenges, they remain constrained by computational inefficiency, catastrophic forgetting, or narrow applicability. In this work, we present Pusa, a groundbreaking paradigm that leverages vectorized timestep adaptation (VTA) to enable fine-grained temporal control within a unified video diffusion framework. Besides, VTA is a non-destructive adaptation, which means it fully preserves the capabilities of the base model. By finetuning the SOTA Wan2.1-T2V-14B model with VTA, we achieve unprecedented efficiency -- surpassing the performance of Wan-I2V-14B with $\leq$ 1/200 of the training cost (\$500 vs. $\geq$ \$100,000) and $\leq$ 1/2500 of the dataset size (4K vs. $\geq$ 10M samples). Pusa not only sets a new standard for image-to-video (I2V) generation, achieving a VBench-I2V total score of 87.32\% (vs. 86.86\% of Wan-I2V-14B), but also unlocks many zero-shot multi-task capabilities such as start-end frames and video extension -- all without task-specific training. Meanwhile, Pusa can still perform text-to-video generation. Mechanistic analyses reveal that our approach preserves the foundation model's generative priors while surgically injecting temporal dynamics, avoiding the combinatorial explosion inherent to vectorized timesteps. This work establishes a scalable, efficient, and versatile paradigm for next-generation video synthesis, democratizing high-fidelity video generation for research and industry alike. Code is open-sourced at https://github.com/Yaofang-Liu/Pusa-VidGen

CVOct 16, 2024Code
Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model

Yang Liu, Yaofang Liu, Jinshan Pan et al.

Most existing super-resolution methods and datasets have been developed to improve the image quality in well-lighted conditions. However, these methods do not work well in real-world low-light conditions as the images captured in such conditions lose most important information and contain significant unknown noises. To solve this problem, we propose a SRRIIE dataset with an efficient conditional diffusion probabilistic models-based method. The proposed dataset contains 4800 paired low-high quality images. To ensure that the dataset are able to model the real-world image degradation in low-illumination environments, we capture images using an ILDC camera and an optical zoom lens with exposure levels ranging from -6 EV to 0 EV and ISO levels ranging from 50 to 12800. We comprehensively evaluate with various reconstruction and perceptual metrics and demonstrate the practicabilities of the SRRIIE dataset for deep learning-based methods. We show that most existing methods are less effective in preserving the structures and sharpness of restored images from complicated noises. To overcome this problem, we revise the condition for Raw sensor data and propose a novel time-melding condition for diffusion probabilistic model. Comprehensive quantitative and qualitative experimental results on the real-world benchmark datasets demonstrate the feasibility and effectivenesses of the proposed conditional diffusion probabilistic model on Raw sensor data. Code and dataset will be available at https://github.com/Yaofang-Liu/Super-Resolving

86.9CVMay 12
Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm

Yaofang Liu, Kangning Cui, Meng Chu et al.

Humans often specify and create through visual artifacts: typography sheets, sketches, reference images, and annotated scenes. Yet modern visual generators still ask users to serialize this intent into text, a bottleneck that compresses signals like spatial structure, exact appearance, and glyph shape. We propose \textbf{\emph{visual-to-visual} (V2V)} generation, in which the user conditions a generative model with a visual specification page rather than a text prompt. The page is not an edit target, but a visual document that specifies the desired output. We introduce \textbf{V2V-Zero}, a training-free framework that exposes this interface in existing vision-language model (VLM) conditioned generators by replacing text-only conditioning with final-layer hidden states extracted from visual pages, exploiting the fact that the frozen VLM already maps both text and images into the generator's conditioning space. On GenEval, V2V-Zero reaches 0.85 with a frozen Qwen-Image backbone, closely matching its optimized text-to-image performance without fine-tuning. To evaluate the broader V2V space, we introduce \textbf{Simple-V2V Bench}, spanning seven visual-conditioning tasks and seven models, including GPT Image 2, Nano Banana 2, Seedream 5.0 Lite, open-weight baselines, and a video extension. V2V-Zero scores 32.7/100, outperforming evaluated open-weight image baselines and revealing a clear capability hierarchy: attribute binding is strong, content generation is unreliable, and structural control remains hard even for commercial systems. A HunyuanVideo-1.5 extension scores 20.2/100, showing the interface transfers beyond images. Mechanistic analysis shows the default reasoning path is primarily visually routed, with 95.0\% of conditioning-token attention mass on visual-page hidden states.

CVDec 24, 2023
Superpixel-based and Spatially-regularized Diffusion Learning for Unsupervised Hyperspectral Image Clustering

Kangning Cui, Ruoning Li, Sam L. Polk et al.

Hyperspectral images (HSIs) provide exceptional spatial and spectral resolution of a scene, crucial for various remote sensing applications. However, the high dimensionality, presence of noise and outliers, and the need for precise labels of HSIs present significant challenges to HSIs analysis, motivating the development of performant HSI clustering algorithms. This paper introduces a novel unsupervised HSI clustering algorithm, Superpixel-based and Spatially-regularized Diffusion Learning (S2DL), which addresses these challenges by incorporating rich spatial information encoded in HSIs into diffusion geometry-based clustering. S2DL employs the Entropy Rate Superpixel (ERS) segmentation technique to partition an image into superpixels, then constructs a spatially-regularized diffusion graph using the most representative high-density pixels. This approach reduces computational burden while preserving accuracy. Cluster modes, serving as exemplars for underlying cluster structure, are identified as the highest-density pixels farthest in diffusion distance from other highest-density pixels. These modes guide the labeling of the remaining representative pixels from ERS superpixels. Finally, majority voting is applied to the labels assigned within each superpixel to propagate labels to the rest of the image. This spatial-spectral approach simultaneously simplifies graph construction, reduces computational cost, and improves clustering performance. S2DL's performance is illustrated with extensive experiments on three publicly available, real-world HSIs: Indian Pines, Salinas, and Salinas A. Additionally, we apply S2DL to landscape-scale, unsupervised mangrove species mapping in the Mai Po Nature Reserve, Hong Kong, using a Gaofen-5 HSI. The success of S2DL in these diverse numerical experiments indicates its efficacy on a wide range of important unsupervised remote sensing analysis tasks.

CVDec 31, 2023
Double-well Net for Image Segmentation

Hao Liu, Jun Liu, Raymond H. Chan et al.

In this study, our goal is to integrate classical mathematical models with deep neural networks by introducing two novel deep neural network models for image segmentation known as Double-well Nets. Drawing inspirations from the Potts model, our models leverage neural networks to represent a region force functional. We extend the well-know MBO (Merriman-Bence-Osher) scheme to solve the Potts model. The widely recognized Potts model is approximated using a double-well potential and then solved by an operator-splitting method, which turns out to be an extension of the well-known MBO scheme. Subsequently, we replace the region force functional in the Potts model with a UNet-type network, which is data-driven and is designed to capture multiscale features of images, and also introduce control variables to enhance effectiveness. The resulting algorithm is a neural network activated by a function that minimizes the double-well potential. What sets our proposed Double-well Nets apart from many existing deep learning methods for image segmentation is their strong mathematical foundation. They are derived from the network approximation theory and employ the MBO scheme to approximately solve the Potts model. By incorporating mathematical principles, Double-well Nets bridge the MBO scheme and neural networks, and offer an alternative perspective for designing networks with mathematical backgrounds. Through comprehensive experiments, we demonstrate the performance of Double-well Nets, showcasing their superior accuracy and robustness compared to state-of-the-art neural networks. Overall, our work represents a valuable contribution to the field of image segmentation by combining the strengths of classical variational models and deep neural networks. The Double-well Nets introduce an innovative approach that leverages mathematical foundations to enhance segmentation performance.

CVOct 14, 2024
Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery

Kangning Cui, Wei Tang, Rongkun Zhu et al.

Understanding the spatial distribution of palms within tropical forests is essential for effective ecological monitoring, conservation strategies, and the sustainable integration of natural forest products into local and global supply chains. However, the analysis of remotely sensed data in these environments faces significant challenges, such as overlapping palm and tree crowns, uneven shading across the canopy surface, and the heterogeneous nature of the forest landscapes, which often affect the performance of palm detection and segmentation algorithms. To overcome these issues, we introduce PalmDSNet, a deep learning framework for real-time detection, segmentation, and counting of canopy palms. Additionally, we employ a bimodal reproduction algorithm that simulates palm spatial propagation to further enhance the understanding of these point patterns using PalmDSNet's results. We used UAV-captured imagery to create orthomosaics from 21 sites across western Ecuadorian tropical forests, covering a gradient from the everwet Chocó forests near Colombia to the drier forests of southwestern Ecuador. Expert annotations were used to create a comprehensive dataset, including 7,356 bounding boxes on image patches and 7,603 palm centers across five orthomosaics, encompassing a total area of 449 hectares. By combining PalmDSNet with the bimodal reproduction algorithm, which optimizes parameters for both local and global spatial variability, we effectively simulate the spatial distribution of palms in diverse and dense tropical environments, validating its utility for advanced applications in tropical forest monitoring and remote sensing analysis.

LGOct 5, 2025
A Mathematical Explanation of Transformers for Large Language Models and GPTs

Xue-Cheng Tai, Hao Liu, Lingfeng Li et al.

The Transformer architecture has revolutionized the field of sequence modeling and underpins the recent breakthroughs in large language models (LLMs). However, a comprehensive mathematical theory that explains its structure and operations remains elusive. In this work, we propose a novel continuous framework that rigorously interprets the Transformer as a discretization of a structured integro-differential equation. Within this formulation, the self-attention mechanism emerges naturally as a non-local integral operator, and layer normalization is characterized as a projection to a time-dependent constraint. This operator-theoretic and variational perspective offers a unified and interpretable foundation for understanding the architecture's core components, including attention, feedforward layers, and normalization. Our approach extends beyond previous theoretical analyses by embedding the entire Transformer operation in continuous domains for both token indices and feature dimensions. This leads to a principled and flexible framework that not only deepens theoretical insight but also offers new directions for architecture design, analysis, and control-based interpretations. This new interpretation provides a step toward bridging the gap between deep learning architectures and continuous mathematical modeling, and contributes a foundational perspective to the ongoing development of interpretable and theoretically grounded neural network models.

CVNov 25, 2025
Blind Adaptive Local Denoising for CEST Imaging

Chu Chen, Aitor Artola, Yang Liu et al.

Chemical Exchange Saturation Transfer (CEST) MRI enables molecular-level visualization of low-concentration metabolites by leveraging proton exchange dynamics. However, its clinical translation is hindered by inherent challenges: spatially varying noise arising from hardware limitations, and complex imaging protocols introduce heteroscedasticity in CEST data, perturbing the accuracy of quantitative contrast mapping such as amide proton transfer (APT) imaging. Traditional denoising methods are not designed for this complex noise and often alter the underlying information that is critical for biomedical analysis. To overcome these limitations, we propose a new Blind Adaptive Local Denoising (BALD) method. BALD exploits the self-similar nature of CEST data to derive an adaptive variance-stabilizing transform that equalizes the noise distributions across CEST pixels without prior knowledge of noise characteristics. Then, BALD performs two-stage denoising on a linear transformation of data to disentangle molecular signals from noise. A local SVD decomposition is used as a linear transform to prevent spatial and spectral denoising artifacts. We conducted extensive validation experiments on multiple phantoms and \textit{in vivo} CEST scans. In these experiments, BALD consistently outperformed state-of-the-art CEST denoisers in both denoising metrics and downstream tasks such as molecular concentration maps estimation and cancer detection.

CVSep 17, 2025
LamiGauss: Pitching Radiative Gaussian for Sparse-View X-ray Laminography Reconstruction

Chu Chen, Ander Biguri, Jean-Michel Morel et al.

X-ray Computed Laminography (CL) is essential for non-destructive inspection of plate-like structures in applications such as microchips and composite battery materials, where traditional computed tomography (CT) struggles due to geometric constraints. However, reconstructing high-quality volumes from laminographic projections remains challenging, particularly under highly sparse-view acquisition conditions. In this paper, we propose a reconstruction algorithm, namely LamiGauss, that combines Gaussian Splatting radiative rasterization with a dedicated detector-to-world transformation model incorporating the laminographic tilt angle. LamiGauss leverages an initialization strategy that explicitly filters out common laminographic artifacts from the preliminary reconstruction, preventing redundant Gaussians from being allocated to false structures and thereby concentrating model capacity on representing the genuine object. Our approach effectively optimizes directly from sparse projections, enabling accurate and efficient reconstruction with limited data. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness and superiority of the proposed method over existing techniques. LamiGauss uses only 3$\%$ of full views to achieve superior performance over the iterative method optimized on a full dataset.

NAAug 15, 2025
Fluid Dynamics and Domain Reconstruction from Noisy Flow Images Using Physics-Informed Neural Networks and Quasi-Conformal Mapping

Han Zhang, Xue-Cheng Tai, Jean-Michel Morel et al.

Blood flow imaging provides important information for hemodynamic behavior within the vascular system and plays an essential role in medical diagnosis and treatment planning. However, obtaining high-quality flow images remains a significant challenge. In this work, we address the problem of denoising flow images that may suffer from artifacts due to short acquisition times or device-induced errors. We formulate this task as an optimization problem, where the objective is to minimize the discrepancy between the modeled velocity field, constrained to satisfy the Navier-Stokes equations, and the observed noisy velocity data. To solve this problem, we decompose it into two subproblems: a fluid subproblem and a geometry subproblem. The fluid subproblem leverages a Physics-Informed Neural Network to reconstruct the velocity field from noisy observations, assuming a fixed domain. The geometry subproblem aims to infer the underlying flow region by optimizing a quasi-conformal mapping that deforms a reference domain. These two subproblems are solved in an alternating Gauss-Seidel fashion, iteratively refining both the velocity field and the domain. Upon convergence, the framework yields a high-quality reconstruction of the flow image. We validate the proposed method through experiments on synthetic flow data in a converging channel geometry under varying levels of Gaussian noise, and on real-like flow data in an aortic geometry with signal-dependent noise. The results demonstrate the effectiveness and robustness of the approach. Additionally, ablation studies are conducted to assess the influence of key hyperparameters.

CVMay 20, 2025
Blind Restoration of High-Resolution Ultrasound Video

Chu Chen, Kangning Cui, Pasquale Cascarano et al.

Ultrasound imaging is widely applied in clinical practice, yet ultrasound videos often suffer from low signal-to-noise ratios (SNR) and limited resolutions, posing challenges for diagnosis and analysis. Variations in equipment and acquisition settings can further exacerbate differences in data distribution and noise levels, reducing the generalizability of pre-trained models. This work presents a self-supervised ultrasound video super-resolution algorithm called Deep Ultrasound Prior (DUP). DUP employs a video-adaptive optimization process of a neural network that enhances the resolution of given ultrasound videos without requiring paired training data while simultaneously removing noise. Quantitative and visual evaluations demonstrate that DUP outperforms existing super-resolution algorithms, leading to substantial improvements for downstream applications.

CVMar 22, 2025
Efficient Diffusion Training through Parallelization with Truncated Karhunen-Loève Expansion

Yumeng Ren, Yaofang Liu, Aitor Artola et al.

Diffusion denoising models have become a popular approach for image generation, but they often suffer from slow convergence during training. In this paper, we identify that this slow convergence is partly due to the complexity of the Brownian motion driving the forward-time process. To address this, we represent the Brownian motion using the Karhunen-Loève expansion, truncating it to a limited number of eigenfunctions. We propose a novel ordinary differential equation with augmented random initials, termed KL diffusion, as a new forward-time process for training and sampling. By developing an appropriate denoising loss function, we facilitate the integration of our KL-diffusion into existing denoising-based models. Using the widely adopted DDIM framework as our baseline ensures a fair comparison, as our modifications focus solely on the forward process and loss function, leaving the network architecture and sampling methods unchanged. Our method significantly outperforms baseline diffusion models, achieving convergence speeds that are twice faster to reach the best FID score of the baseline and ultimately yielding much lower FID scores. Notably, our approach allows for highly parallelized computation, requires no additional learnable parameters, and can be flexibly integrated into existing diffusion methods. The code will be made publicly available.

LGOct 17, 2024
PiLocNet: Physics-informed neural network on 3D localization with rotating point spread function

Mingda Lu, Zitian Ao, Chao Wang et al.

For the 3D localization problem using point spread function (PSF) engineering, we propose a novel enhancement of our previously introduced localization neural network, LocNet. The improved network is a physics-informed neural network (PINN) that we call PiLocNet. Previous works on the localization problem may be categorized separately into model-based optimization and neural network approaches. Our PiLocNet combines the unique strengths of both approaches by incorporating forward-model-based information into the network via a data-fitting loss term that constrains the neural network to yield results that are physically sensible. We additionally incorporate certain regularization terms from the variational method, which further improves the robustness of the network in the presence of image noise, as we show for the Poisson and Gaussian noise models. This framework accords interpretability to the neural network, and the results we obtain show its superiority. Although the paper focuses on the use of single-lobe rotating PSF to encode the full 3D source location, we expect the method to be widely applicable to other PSFs and imaging problems that are constrained by known forward processes.

CVApr 17, 2017
A Nuclear-norm Model for Multi-Frame Super-Resolution Reconstruction from Video Clips

Rui Zhao, Raymond H. Chan

We propose a variational approach to obtain super-resolution images from multiple low-resolution frames extracted from video clips. First the displacement between the low-resolution frames and the reference frame are computed by an optical flow algorithm. Then a low-rank model is used to construct the reference frame in high-resolution by incorporating the information of the low-resolution frames. The model has two terms: a 2-norm data fidelity term and a nuclear-norm regularization term. Alternating direction method of multipliers is used to solve the model. Comparison of our methods with other models on synthetic and real video clips show that our resulting images are more accurate with less artifacts. It also provides much finer and discernable details.

NAJun 9, 2015
A FEAST Algorithm with oblique projection for generalized eigenvalue problems

Guojian Yin, Raymond H. Chan, Man-Chung Yeung

The contour-integral based eigensolvers are the recent efforts for computing the eigenvalues inside a given region in the complex plane. The best-known members are the Sakurai-Sugiura (SS) method, its stable version CIRR, and the FEAST algorithm. An attractive computational advantage of these methods is that they are easily parallelizable. The FEAST algorithm was developed for the generalized Hermitian eigenvalue problems. It is stable and accurate. However, it may fail when applied to non-Hermitian problems. In this paper, we extend the FEAST algorithm to non-Hermitian problems. The approach can be summarized as follows: (i) to construct a particular contour integral to form a subspace containing the desired eigenspace, and (ii) to use the oblique projection technique to extract desired eigenpairs with appropriately chosen test subspace. The related mathematical framework is established. We also address some implementation issues such as how to choose a suitable starting matrix and design good stopping criteria. Numerical experiments are provided to illustrate that our method is stable and efficient.

LGJul 2, 2014
Geometric Tight Frame based Stylometry for Art Authentication of van Gogh Paintings

Haixia Liu, Raymond H. Chan, Yuan Yao

This paper is about authenticating genuine van Gogh paintings from forgeries. The authentication process depends on two key steps: feature extraction and outlier detection. In this paper, a geometric tight frame and some simple statistics of the tight frame coefficients are used to extract features from the paintings. Then a forward stage-wise rank boosting is used to select a small set of features for more accurate classification so that van Gogh paintings are highly concentrated towards some center point while forgeries are spread out as outliers. Numerical results show that our method can achieve 86.08% classification accuracy under the leave-one-out cross-validation procedure. Our method also identifies five features that are much more predominant than other features. Using just these five features for classification, our method can give 88.61% classification accuracy which is the highest so far reported in literature. Evaluation of the five features is also performed on two hundred datasets generated by bootstrap sampling with replacement. The median and the mean are 88.61% and 87.77% respectively. Our results show that a small set of statistics of the tight frame coefficients along certain orientations can serve as discriminative features for van Gogh paintings. It is more important to look at the tail distributions of such directional coefficients than mean values and standard deviations. It reflects a highly consistent style in van Gogh's brushstroke movements, where many forgeries demonstrate a more diverse spread in these features.