Hua Li

CV
h-index49
64papers
1,054citations
Novelty47%
AI Score56

64 Papers

IVSep 19, 2023
Assessing the capacity of a denoising diffusion probabilistic model to reproduce spatial context

Rucha Deshpande, Muzaffer Özbey, Hua Li et al.

Diffusion models have emerged as a popular family of deep generative models (DGMs). In the literature, it has been claimed that one class of diffusion models -- denoising diffusion probabilistic models (DDPMs) -- demonstrate superior image synthesis performance as compared to generative adversarial networks (GANs). To date, these claims have been evaluated using either ensemble-based methods designed for natural images, or conventional measures of image quality such as structural similarity. However, there remains an important need to understand the extent to which DDPMs can reliably learn medical imaging domain-relevant information, which is referred to as `spatial context' in this work. To address this, a systematic assessment of the ability of DDPMs to learn spatial context relevant to medical imaging applications is reported for the first time. A key aspect of the studies is the use of stochastic context models (SCMs) to produce training data. In this way, the ability of the DDPMs to reliably reproduce spatial context can be quantitatively assessed by use of post-hoc image analyses. Error-rates in DDPM-generated ensembles are reported, and compared to those corresponding to a modern GAN. The studies reveal new and important insights regarding the capacity of DDPMs to learn spatial context. Notably, the results demonstrate that DDPMs hold significant capacity for generating contextually correct images that are `interpolated' between training samples, which may benefit data-augmentation tasks in ways that GANs cannot.

CVJul 11, 2023Code
Compact Twice Fusion Network for Edge Detection

Yachuan Li, Zongmin Li, Xavier Soria P. et al.

The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at https://github.com/Li-yachuan/CTFN-pytorch-master.

CVAug 17, 2022
Stereo Superpixel Segmentation Via Decoupled Dynamic Spatial-Embedding Fusion Network

Hua Li, Junyan Liang, Ruiqi Wu et al.

Stereo superpixel segmentation aims at grouping the discretizing pixels into perceptual regions through left and right views more collaboratively and efficiently. Existing superpixel segmentation algorithms mostly utilize color and spatial features as input, which may impose strong constraints on spatial information while utilizing the disparity information in terms of stereo image pairs. To alleviate this issue, we propose a stereo superpixel segmentation method with a decoupling mechanism of spatial information in this work. To decouple stereo disparity information and spatial information, the spatial information is temporarily removed before fusing the features of stereo image pairs, and a decoupled stereo fusion module (DSFM) is proposed to handle the stereo features alignment as well as occlusion problems. Moreover, since the spatial information is vital to superpixel segmentation, we further design a dynamic spatiality embedding module (DSEM) to re-add spatial information, and the weights of spatial information will be adaptively adjusted through the dynamic fusion (DF) mechanism in DSEM for achieving a finer segmentation. Comprehensive experimental results demonstrate that our method can achieve the state-of-the-art performance on the KITTI2015 and Cityscapes datasets, and also verify the efficiency when applied in salient object detection on NJU2K dataset. The source code will be available publicly after paper is accepted.

IVJun 23, 2022
A novel adversarial learning strategy for medical image classification

Zong Fan, Xiaohui Zhang, Jacob A. Gasienica et al.

Deep learning (DL) techniques have been extensively utilized for medical image classification. Most DL-based classification networks are generally structured hierarchically and optimized through the minimization of a single loss function measured at the end of the networks. However, such a single loss design could potentially lead to optimization of one specific value of interest but fail to leverage informative features from intermediate layers that might benefit classification performance and reduce the risk of overfitting. Recently, auxiliary convolutional neural networks (AuxCNNs) have been employed on top of traditional classification networks to facilitate the training of intermediate layers to improve classification performance and robustness. In this study, we proposed an adversarial learning-based AuxCNN to support the training of deep neural networks for medical image classification. Two main innovations were adopted in our AuxCNN classification framework. First, the proposed AuxCNN architecture includes an image generator and an image discriminator for extracting more informative image features for medical image classification, motivated by the concept of generative adversarial network (GAN) and its impressive ability in approximating target data distribution. Second, a hybrid loss function is designed to guide the model training by incorporating different objectives of the classification network and AuxCNN to reduce overfitting. Comprehensive experimental studies demonstrated the superior classification performance of the proposed model. The effect of the network-related factors on classification performance was investigated.

IVOct 11, 2022
Joint localization and classification of breast tumors on ultrasound images using a novel auxiliary attention-based framework

Zong Fan, Ping Gong, Shanshan Tang et al.

Automatic breast lesion detection and classification is an important task in computer-aided diagnosis, in which breast ultrasound (BUS) imaging is a common and frequently used screening tool. Recently, a number of deep learning-based methods have been proposed for joint localization and classification of breast lesions using BUS images. In these methods, features extracted by a shared network trunk are appended by two independent network branches to achieve classification and localization. Improper information sharing might cause conflicts in feature optimization in the two branches and leads to performance degradation. Also, these methods generally require large amounts of pixel-level annotated data for model training. To overcome these limitations, we proposed a novel joint localization and classification model based on the attention mechanism and disentangled semi-supervised learning strategy. The model used in this study is composed of a classification network and an auxiliary lesion-aware network. By use of the attention mechanism, the auxiliary lesion-aware network can optimize multi-scale intermediate feature maps and extract rich semantic information to improve classification and localization performance. The disentangled semi-supervised learning strategy only requires incomplete training datasets for model training. The proposed modularized framework allows flexible network replacement to be generalized for various applications. Experimental results on two different breast ultrasound image datasets demonstrate the effectiveness of the proposed method. The impacts of various network factors on model performance are also investigated to gain deep insights into the designed framework.

49.1LGMay 28
Gradient Perturbation: Learning to Perturb Gradients for Adaptive Training

Hua Li

Deep neural network training involves both forward propagation (from features through logits to loss) and backward propagation (from loss through gradients to parameter updates). While perturbations along the forward chain, including feature perturbation, logit perturbation, and label perturbation, have been extensively studied, the backward chain's gradient perturbation has received little systematic investigation. In this paper, we establish a unified framework for gradient perturbation, revealing that existing methods such as Sharpness-Aware Minimization (SAM), gradient clipping, and gradient noise injection can all be interpreted as imposing specific forms of gradient perturbation. Analogous to the recently proposed Logit Perturbation Learning (LPL), we conjecture that amplifying the gradient norm for a class acts as positive augmentation (enhancing learning), while dampening it acts as negative augmentation (suppressing overfitting). Based on these observations, we propose Learning to Perturb Gradients (LPG), which adaptively perturbs logit-level gradients at the class level to achieve category-aware training. We also establish theoretical connections between gradient perturbation bounds and generalization guarantees via PAC-Bayesian analysis. Experiments on balanced classification, long-tail classification, and noisy label learning demonstrate that LPG consistently outperforms existing methods and can be combined with them as a plug-in module.

14.9LGMay 28
Learning to Perturb Hidden Representations for Generalizable Deep Learning

Hua Li

Deep neural networks process data through a cascade of representations: input features, hidden activations, logits, and loss. While perturbations at the input, logit, and label levels have been systematically studied, the intermediate hidden activations, which constitute the bulk of the network's computation, have received no unified perturbation analysis. In this paper, we establish a unified framework for hidden activation perturbation, revealing that Dropout, Manifold Mixup, adversarial feature perturbation, and related methods all impose specific forms of activation perturbation but with class-agnostic or random strategies. We conjecture that expansive perturbation (increasing activation norm) acts as positive augmentation, while contractive perturbation (decreasing activation norm) acts as negative augmentation, and that the perturbation layer determines whether the effect resembles input-level augmentation (shallow layers) or logit-level manipulation (deep layers). We propose Learning to Perturb Activations (LPA), which adaptively perturbs activations at a selected hidden layer with class-level perturbations learned via PGD. We further provide theoretical analysis connecting activation perturbation to flat minima and perturbation amplification through layers. Experiments on balanced classification, long-tail classification, and domain generalization demonstrate that LPA consistently outperforms existing methods and provides complementary benefits to logit perturbation methods such as LPL.

CVAug 6, 2024Code
Evaluation of Segment Anything Model 2: The Role of SAM2 in the Underwater Environment

Shijie Lian, Hua Li

With breakthroughs in large-scale modeling, the Segment Anything Model (SAM) and its extensions have been attempted for applications in various underwater visualization tasks in marine sciences, and have had a significant impact on the academic community. Recently, Meta has further developed the Segment Anything Model 2 (SAM2), which significantly improves running speed and segmentation accuracy compared to its predecessor. This report aims to explore the potential of SAM2 in marine science by evaluating it on the underwater instance segmentation benchmark datasets UIIS and USIS10K. The experiments show that the performance of SAM2 is extremely dependent on the type of user-provided prompts. When using the ground truth bounding box as prompt, SAM2 performed excellently in the underwater instance segmentation domain. However, when running in automatic mode, SAM2's ability with point prompts to sense and segment underwater instances is significantly degraded. It is hoped that this paper will inspire researchers to further explore the SAM model family in the underwater domain. The results and evaluation codes in this paper are available at https://github.com/LiamLian0727/UnderwaterSAM2Eval.

MLOct 8, 2022
Spectrally-Corrected and Regularized Linear Discriminant Analysis for Spiked Covariance Model

Hua Li, Wenya Luo, Zhidong Bai et al.

This paper proposes an improved linear discriminant analysis called spectrally-corrected and regularized LDA (SRLDA). This method integrates the design ideas of the sample spectrally-corrected covariance matrix and the regularized discriminant analysis. With the support of a large-dimensional random matrix analysis framework, it is proved that SRLDA has a linear classification global optimal solution under the spiked model assumption. According to simulation data analysis, the SRLDA classifier performs better than RLDA and ILDA and is closer to the theoretical classifier. Experiments on different data sets show that the SRLDA algorithm performs better in classification and dimensionality reduction than currently used tools.

54.2MLMay 23
Clustering based on Stochastic Dominance with application for risk averters and risk seekers

Hua Li, Xue Jia, Yilin Kang et al.

Stochastic Dominance (SD) theory provides a rigorous framework for selecting superior assets tailored to the asset allocation needs of investors with varying risk preferences (i.e., risk-averse, risk-seeking, and risk-neutral). However, traditional stock clustering methods typically rely on geometric metrics such as Euclidean distance, which often fail to effectively capture the intrinsic risk dominance relationships among assets. To address this limitation, this paper proposes an innovative clustering analysis framework based on SD test statistics. Methodologically, this study deeply integrates SD theory with machine learning algorithms. Transcending the limitations of traditional reliance on geometric distance, we innovatively utilize test statistics from first-, second-, and third-order SD to construct a "Stochastic Dominance Coefficient Matrix." Building upon this matrix, we modify the classic K-means and Hierarchical Clustering algorithms. Specifically, we derive 12 distinct algorithm variants tailored to different orders of SD relationships. Simultaneously, we construct the SD-SC coefficient and the SD-DBI index as specialized validity indices to evaluate the clustering performance. Empirically, we analyze constituent stock data from a representative developed market (the US NASDAQ Index) and an emerging market (China's CSI 100 Index). The results verify the effectiveness and robustness of the proposed method. Furthermore, we apply the clustering results to the modification of the Single Index Model and the construction of Global Minimum Variance Portfolios (GMVP). The findings demonstrate that the proposed method effectively facilitates customized asset allocation for investors, holding significant theoretical value and practical implications.

81.9CVMar 20
PerformRecast: Expression and Head Pose Disentanglement for Portrait Video Editing

Jiadong Liang, Bojun Xiong, Jie Tian et al.

This paper primarily investigates the task of expression-only portrait video performance editing based on a driving video, which plays a crucial role in animation and film industries. Most existing research mainly focuses on portrait animation, which aims to animate a static portrait image according to the facial motion from the driving video. As a consequence, it remains challenging for them to disentangle the facial expression from head pose rotation and thus lack the ability to edit facial expression independently. In this paper, we propose PerformRecast, a versatile expression-only video editing method which is dedicated to recast the performance in existing film and animation. The key insight of our method comes from the characteristics of 3D Morphable Face Model (3DMM), which models the face identity, facial expression and head pose of 3D face mesh with separate parameters. Therefore, we improve the keypoints transformation formula in previous methods to make it more consistent with 3DMM model, which achieves a better disentanglement and provides users with much more fine-grained control. Furthermore, to avoid the misalignment around the boundary of face in generated results, we decouple the facial and non-facial regions of input portrait images and pre-train a teacher model to provide separate supervision for them. Extensive experiments show that our method produces high-quality results which are more faithful to the driving video, outperforming existing methods in both controllability and efficiency. Our code, data and trained models are available at https://youku-aigc.github.io/PerformRecast.

LGAug 17, 2022
The Counterfactual-Shapley Value: Attributing Change in System Metrics

Amit Sharma, Hua Li, Jian Jiao

Given an unexpected change in the output metric of a large-scale system, it is important to answer why the change occurred: which inputs caused the change in metric? A key component of such an attribution question is estimating the counterfactual: the (hypothetical) change in the system metric due to a specified change in a single input. However, due to inherent stochasticity and complex interactions between parts of the system, it is difficult to model an output metric directly. We utilize the computational structure of a system to break up the modelling task into sub-parts, such that each sub-part corresponds to a more stable mechanism that can be modelled accurately over time. Using the system's structure also helps to view the metric as a computation over a structural causal model (SCM), thus providing a principled way to estimate counterfactuals. Specifically, we propose a method to estimate counterfactuals using time-series predictive models and construct an attribution score, CF-Shapley, that is consistent with desirable axioms for attributing an observed change in the output metric. Unlike past work on causal shapley values, our proposed method can attribute a single observed change in output (rather than a population-level effect) and thus provides more accurate attribution scores when evaluated on simulated datasets. As a real-world application, we analyze a query-ad matching system with the goal of attributing observed change in a metric for ad matching density. Attribution scores explain how query volume and ad demand from different query categories affect the ad matching density, leading to actionable insights and uncovering the role of external events (e.g., "Cheetah Day") in driving the matching density.

CVMay 21, 2025Code
Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation

Hua Li, Shijie Lian, Zhiyuan Li et al.

With recent breakthroughs in large-scale modeling, the Segment Anything Model (SAM) has demonstrated significant potential in a variety of visual applications. However, due to the lack of underwater domain expertise, SAM and its variants face performance limitations in end-to-end underwater instance segmentation tasks, while their higher computational requirements further hinder their application in underwater scenarios. To address this challenge, we propose a large-scale underwater instance segmentation dataset, UIIS10K, which includes 10,048 images with pixel-level annotations for 10 categories. Then, we introduce UWSAM, an efficient model designed for automatic and accurate segmentation of underwater instances. UWSAM efficiently distills knowledge from the SAM ViT-Huge image encoder into the smaller ViT-Small image encoder via the Mask GAT-based Underwater Knowledge Distillation (MG-UKD) method for effective visual representation learning. Furthermore, we design an End-to-end Underwater Prompt Generator (EUPG) for UWSAM, which automatically generates underwater prompts instead of explicitly providing foreground points or boxes as prompts, thus enabling the network to locate underwater instances accurately for efficient segmentation. Comprehensive experimental results show that our model is effective, achieving significant performance improvements over state-of-the-art methods on multiple underwater instance datasets. Datasets and codes are available at https://github.com/LiamLian0727/UIIS10K.

CVDec 14, 2024Code
Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes

Zhangjun Zhou, Yiping Li, Chunlin Zhong et al.

While the human visual system employs distinct mechanisms to perceive salient and camouflaged objects, existing models struggle to disentangle these tasks. Specifically, salient object detection (SOD) models frequently misclassify camouflaged objects as salient, while camouflaged object detection (COD) models conversely misinterpret salient objects as camouflaged. We hypothesize that this can be attributed to two factors: (i) the specific annotation paradigm of current SOD and COD datasets, and (ii) the lack of explicit attribute relationship modeling in current models. Prevalent SOD/COD datasets enforce a mutual exclusivity constraint, assuming scenes contain either salient or camouflaged objects, which poorly aligns with the real world. Furthermore, current SOD/COD methods are primarily designed for these highly constrained datasets and lack explicit modeling of the relationship between salient and camouflaged objects. In this paper, to promote the development of unconstrained salient and camouflaged object detection, we construct a large-scale dataset, USC12K, which features comprehensive labels and four different scenes that cover all possible logical existence scenarios of both salient and camouflaged objects. To explicitly model the relationship between salient and camouflaged objects, we propose a model called USCNet, which introduces two distinct prompt query mechanisms for modeling inter-sample and intra-sample attribute relationships. Additionally, to assess the model's ability to distinguish between salient and camouflaged objects, we design an evaluation metric called CSCS. The proposed method achieves state-of-the-art performance across all scenes in various metrics. The code and dataset will be available at https://github.com/ssecv/USCNet.

CVJun 10, 2024Code
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset

Shijie Lian, Ziyi Zhang, Hua Li et al.

With the breakthrough of large models, Segment Anything Model (SAM) and its extensions have been attempted to apply in diverse tasks of computer vision. Underwater salient instance segmentation is a foundational and vital step for various underwater vision tasks, which often suffer from low segmentation accuracy due to the complex underwater circumstances and the adaptive ability of models. Moreover, the lack of large-scale datasets with pixel-level salient instance annotations has impeded the development of machine learning techniques in this field. To address these issues, we construct the first large-scale underwater salient instance segmentation dataset (USIS10K), which contains 10,632 underwater images with pixel-level annotations in 7 categories from various underwater scenes. Then, we propose an Underwater Salient Instance Segmentation architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain. We devise an Underwater Adaptive Visual Transformer (UA-ViT) encoder to incorporate underwater domain visual prompts into the segmentation network. We further design an out-of-the-box underwater Salient Feature Prompter Generator (SFPG) to automatically generate salient prompters instead of explicitly providing foreground points or boxes as prompts in SAM. Comprehensive experimental results show that our USIS-SAM method can achieve superior performance on USIS10K datasets compared to the state-of-the-art methods. Datasets and codes are released on https://github.com/LiamLian0727/USIS10K.

58.7CLMar 30
DongYuan: An LLM-Based Framework for Integrative Chinese and Western Medicine Spleen-Stomach Disorders Diagnosis

Hua Li, Yingying Li, Xiaobin Feng et al.

The clinical burden of spleen-stomach disorders is substantial. While large language models (LLMs) offer new potential for medical applications, they face three major challenges in the context of integrative Chinese and Western medicine (ICWM): a lack of high-quality data, the absence of models capable of effectively integrating the reasoning logic of traditional Chinese medicine (TCM) syndrome differentiation with that of Western medical (WM) disease diagnosis, and the shortage of a standardized evaluation benchmark. To address these interrelated challenges, we propose DongYuan, an ICWM spleen-stomach diagnostic framework. Specifically, three ICWM datasets (SSDF-Syndrome, SSDF-Dialogue, and SSDF-PD) were curated to fill the gap in high-quality data for spleen-stomach disorders. We then developed SSDF-Core, a core diagnostic LLM that acquires robust ICWM reasoning capabilities through a two-stage training regimen of supervised fine-tuning. tuning (SFT) and direct preference optimization (DPO), and complemented it with SSDF-Navigator, a pluggable consultation navigation model designed to optimize clinical inquiry strategies. Additionally, we established SSDF-Bench, a comprehensive evaluation benchmark focused on ICWM diagnosis of spleen-stomach disorders. Experimental results demonstrate that SSDF-Core significantly outperforms 12 mainstream baselines on SSDF-Bench. DongYuan lays a solid methodological foundation and provides practical technical references for the future development of intelligent ICWM diagnostic systems.

IRNov 10, 2025
Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

Yu Hou, Hua Li, Ha Young Kim et al.

Diffusion models recently emerged as a powerful paradigm for recommender systems, offering state-of-the-art performance by modeling the generative process of user-item interactions. However, training such models from scratch is both computationally expensive and yields diminishing returns once convergence is reached. To remedy these challenges, we propose ReFiT, a new framework that integrates Reinforcement learning (RL)-based Fine-Tuning into diffusion-based recommender systems. In contrast to prior RL approaches for diffusion models depending on external reward models, ReFiT adopts a task-aligned design: it formulates the denoising trajectory as a Markov decision process (MDP) and incorporates a collaborative signal-aware reward function that directly reflects recommendation quality. By tightly coupling the MDP structure with this reward signal, ReFiT empowers the RL agent to exploit high-order connectivity for fine-grained optimization, while avoiding the noisy or uninformative feedback common in naive reward designs. Leveraging policy gradient optimization, ReFiT maximizes exact log-likelihood of observed interactions, thereby enabling effective post hoc fine-tuning of diffusion recommenders. Comprehensive experiments on wide-ranging real-world datasets demonstrate that the proposed ReFiT framework (a) exhibits substantial performance gains over strong competitors (up to 36.3% on sequential recommendation), (b) demonstrates strong efficiency with linear complexity in the number of users or items, and (c) generalizes well across multiple diffusion-based recommendation scenarios. The source code and datasets are publicly available at https://anonymous.4open.science/r/ReFiT-4C60.

IVMay 10, 2024
Prior-guided Diffusion Model for Cell Segmentation in Quantitative Phase Imaging

Zhuchen Shao, Mark A. Anastasio, Hua Li

Purpose: Quantitative phase imaging (QPI) is a label-free technique that provides high-contrast images of tissues and cells without the use of chemicals or dyes. Accurate semantic segmentation of cells in QPI is essential for various biomedical applications. While DM-based segmentation has demonstrated promising results, the requirement for multiple sampling steps reduces efficiency. This study aims to enhance DM-based segmentation by introducing prior-guided content information into the starting noise, thereby minimizing inefficiencies associated with multiple sampling. Approach: A prior-guided mechanism is introduced into DM-based segmentation, replacing randomly sampled starting noise with noise informed by content information. This mechanism utilizes another trained DM and DDIM inversion to incorporate content information from the to-be-segmented images into the starting noise. An evaluation method is also proposed to assess the quality of the starting noise, considering both content and distribution information. Results: Extensive experiments on various QPI datasets for cell segmentation showed that the proposed method achieved superior performance in DM-based segmentation with only a single sampling. Ablation studies and visual analysis further highlighted the significance of content priors in DM-based segmentation. Conclusion: The proposed method effectively leverages prior content information to improve DM-based segmentation, providing accurate results while reducing the need for multiple samplings. The findings emphasize the importance of integrating content priors into DM-based segmentation methods for optimal performance.

CVOct 20, 2025
Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset

Chuhong Wang, Hua Li, Chongyi Li et al.

With the development of underwater exploration and marine protection, underwater vision tasks are widespread. Due to the degraded underwater environment, characterized by color distortion, low contrast, and blurring, camouflaged instance segmentation (CIS) faces greater challenges in accurately segmenting objects that blend closely with their surroundings. Traditional camouflaged instance segmentation methods, trained on terrestrial-dominated datasets with limited underwater samples, may exhibit inadequate performance in underwater scenes. To address these issues, we introduce the first underwater camouflaged instance segmentation (UCIS) dataset, abbreviated as UCIS4K, which comprises 3,953 images of camouflaged marine organisms with instance-level annotations. In addition, we propose an Underwater Camouflaged Instance Segmentation network based on Segment Anything Model (UCIS-SAM). Our UCIS-SAM includes three key modules. First, the Channel Balance Optimization Module (CBOM) enhances channel characteristics to improve underwater feature learning, effectively addressing the model's limited understanding of underwater environments. Second, the Frequency Domain True Integration Module (FDTIM) is proposed to emphasize intrinsic object features and reduce interference from camouflage patterns, enhancing the segmentation performance of camouflaged objects blending with their surroundings. Finally, the Multi-scale Feature Frequency Aggregation Module (MFFAM) is designed to strengthen the boundaries of low-contrast camouflaged instances across multiple frequency bands, improving the model's ability to achieve more precise segmentation of camouflaged objects. Extensive experiments on the proposed UCIS4K and public benchmarks show that our UCIS-SAM outperforms state-of-the-art approaches.

CVOct 14, 2025
WaterFlow: Explicit Physics-Prior Rectified Flow for Underwater Saliency Mask Generation

Runting Li, Shijie Lian, Hua Li et al.

Underwater Salient Object Detection (USOD) faces significant challenges, including underwater image quality degradation and domain gaps. Existing methods tend to ignore the physical principles of underwater imaging or simply treat degradation phenomena in underwater images as interference factors that must be eliminated, failing to fully exploit the valuable information they contain. We propose WaterFlow, a rectified flow-based framework for underwater salient object detection that innovatively incorporates underwater physical imaging information as explicit priors directly into the network training process and introduces temporal dimension modeling, significantly enhancing the model's capability for salient object identification. On the USOD10K dataset, WaterFlow achieves a 0.072 gain in S_m, demonstrating the effectiveness and superiority of our method. The code will be published after the acceptance.

CVAug 14, 2025
GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images

Mengyu Ren, Yutong Li, Hua Li et al.

Salient object detection (SOD) in optical remote sensing images (ORSIs) faces numerous challenges, including significant variations in target scales and low contrast between targets and the background. Existing methods based on vision transformers (ViTs) and convolutional neural networks (CNNs) architectures aim to leverage both global and local features, but the difficulty in effectively integrating these heterogeneous features limits their overall performance. To overcome these limitations, we propose a graph-enhanced contextual and regional perception network (GCRPNet), which builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation. Specifically, we employ the visual state space (VSS) encoder to extract multi-scale features. To further achieve deep guidance and enhancement of these features, we first design a difference-similarity guided hierarchical graph attention module (DS-HGAM). This module strengthens cross-layer interaction capabilities between features of different scales while enhancing the model's structural perception,allowing it to distinguish between foreground and background more effectively. Then, we design the LEVSS block as the decoder of GCRPNet. This module integrates our proposed adaptive scanning strategy and multi-granularity collaborative attention enhancement module (MCAEM). It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information and enhancing Mamba's local modeling capability. Extensive experimental results demonstrate that the proposed model achieves state-of-the-art performance, validating its effectiveness and superiority.

CVMay 12, 2025
TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian

Shijie Lian, Ziyi Zhang, Laurence Tianruo Yang and et al.

Underwater 3D scene reconstruction is crucial for undewater robotic perception and navigation. However, the task is significantly challenged by the complex interplay between light propagation, water medium, and object surfaces, with existing methods unable to model their interactions accurately. Additionally, expensive training and rendering costs limit their practical application in underwater robotic systems. Therefore, we propose Tensorized Underwater Gaussian Splatting (TUGS), which can effectively solve the modeling challenges of the complex interactions between object geometries and water media while achieving significant parameter reduction. TUGS employs lightweight tensorized higher-order Gaussians with a physics-based underwater Adaptive Medium Estimation (AME) module, enabling accurate simulation of both light attenuation and backscatter effects in underwater environments. Compared to other NeRF-based and GS-based methods designed for underwater, TUGS is able to render high-quality underwater images with faster rendering speeds and less memory usage. Extensive experiments on real-world underwater datasets have demonstrated that TUGS can efficiently achieve superior reconstruction quality using a limited number of parameters, making it particularly suitable for memory-constrained underwater UAV applications

CVApr 30, 2025
Mamba Based Feature Extraction And Adaptive Multilevel Feature Fusion For 3D Tumor Segmentation From Multi-modal Medical Image

Zexin Ji, Beiji Zou, Xiaoyan Kui et al.

Multi-modal 3D medical image segmentation aims to accurately identify tumor regions across different modalities, facing challenges from variations in image intensity and tumor morphology. Traditional convolutional neural network (CNN)-based methods struggle with capturing global features, while Transformers-based methods, despite effectively capturing global context, encounter high computational costs in 3D medical image segmentation. The Mamba model combines linear scalability with long-distance modeling, making it a promising approach for visual representation learning. However, Mamba-based 3D multi-modal segmentation still struggles to leverage modality-specific features and fuse complementary information effectively. In this paper, we propose a Mamba based feature extraction and adaptive multilevel feature fusion for 3D tumor segmentation using multi-modal medical image. We first develop the specific modality Mamba encoder to efficiently extract long-range relevant features that represent anatomical and pathological structures present in each modality. Moreover, we design an bi-level synergistic integration block that dynamically merges multi-modal and multi-level complementary features by the modality attention and channel attention learning. Lastly, the decoder combines deep semantic information with fine-grained details to generate the tumor segmentation map. Experimental results on medical image datasets (PET/CT and MRI multi-sequence) show that our approach achieve competitive performance compared to the state-of-the-art CNN, Transformer, and Mamba-based approaches.

LGMar 17, 2025
Spectrally-Corrected and Regularized QDA Classifier for Spiked Covariance Model

Wenya Luo, Hua Li, Zhidong Bai et al.

Quadratic discriminant analysis (QDA) is a widely used method for classification problems, particularly preferable over Linear Discriminant Analysis (LDA) for heterogeneous data. However, QDA loses its effectiveness in high-dimensional settings, where the data dimension and sample size tend to infinity. To address this issue, we propose a novel QDA method utilizing spectral correction and regularization techniques, termed SR-QDA. The regularization parameters in our method are selected by maximizing the Fisher-discriminant ratio. We compare SR-QDA with QDA, regularized quadratic discriminant analysis (R-QDA), and several other competitors. The results indicate that SR-QDA performs exceptionally well, especially in moderate and high-dimensional situations. Empirical experiments across diverse datasets further support this conclusion.

IRJun 20, 2024
Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning

Amit Sharma, Hua Li, Xue Li et al.

Given an input query, a recommendation model is trained using user feedback data (e.g., click data) to output a ranked list of items. In real-world systems, besides accuracy, an important consideration for a new model is novelty of its top-k recommendations w.r.t. an existing deployed model. However, novelty of top-k items is a difficult goal to optimize a model for, since it involves a non-differentiable sorting operation on the model's predictions. Moreover, novel items, by definition, do not have any user feedback data. Given the semantic capabilities of large language models, we address these problems using a reinforcement learning (RL) formulation where large language models provide feedback for the novel items. However, given millions of candidate items, the sample complexity of a standard RL algorithm can be prohibitively high. To reduce sample complexity, we reduce the top-k list reward to a set of item-wise rewards and reformulate the state space to consist of <query, item> tuples such that the action space is reduced to a binary decision; and show that this reformulation results in a significantly lower complexity when the number of items is large. We evaluate the proposed algorithm on improving novelty for a query-ad recommendation task on a large-scale search engine. Compared to supervised finetuning on recent <query, ad> pairs, the proposed RL-based algorithm leads to significant novelty gains with minimal loss in recall. We obtain similar results on the ORCAS query-webpage matching dataset and a product recommendation dataset based on Amazon reviews.

AIJun 15, 2024
Task Facet Learning: A Structured Approach to Prompt Optimization

Gurusha Juneja, Gautam Jajoo, Nagarajan Natarajan et al.

Given a task in the form of a basic description and its training examples, prompt optimization is the problem of synthesizing the given information into a text prompt for a large language model. Humans solve this problem by also considering the different facets that define a task (e.g., counter-examples, explanations, analogies) and including them in the prompt. However, it is unclear whether existing algorithmic approaches, based on iteratively editing a given prompt or automatically selecting a few in-context examples, can cover the multiple facets required to solve a complex task. In this work, we view prompt optimization as that of learning multiple facets of a task from a set of training examples. We exploit structure in the prompt optimization problem and break down a prompt into loosely coupled semantic sections. The proposed algorithm, UniPrompt, (1) clusters the input space and uses clustered batches so that each batch likely corresponds to a different facet of the task, and (2) utilizes a feedback mechanism to propose adding, editing or deleting a section, which in turn is aggregated over a batch to capture generalizable facets. Empirical evaluation on multiple datasets and a real-world task shows that prompts generated using \shortname{} obtain higher accuracy than human-tuned prompts and those from state-of-the-art methods. In particular, our algorithm can generate long, complex prompts that existing methods are unable to generate. Code for UniPrompt is available at https://aka.ms/uniprompt.

IVMay 19, 2023
A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy

Xiyao Jin, Yao Hao, Jessica Hilliard et al.

To safely deploy deep learning models in the clinic, a quality assurance framework is needed for routine or continuous monitoring of input-domain shift and the models' performance without ground truth contours. In this work, cardiac substructure segmentation was used as an example task to establish a QA framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients were collected, including one 'common' image domain and five 'uncommon' domains. Segmentation models were tested on the benchmark dataset for an initial evaluation of model capacity and limitations. An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features. Another Variational Autoencoder (VAE) was also trained to estimate the shape quality of the auto-segmentation results. Using the extracted features from the image/segmentation pair as inputs, a regression model was trained to predict the per-patient segmentation accuracy, measured by Dice coefficient similarity (DSC). The framework was tested across 19 segmentation models to evaluate the generalizability of the entire framework. As results, the predicted DSC of regression models achieved a mean absolute error (MAE) ranging from 0.036 to 0.046 with an averaged MAE of 0.041. When tested on the benchmark dataset, the performances of all segmentation models were not significantly affected by scanning parameters: FOV, slice thickness and reconstructions kernels. For input images with Poisson noise, CNN-based segmentation models demonstrated a decreased DSC ranging from 0.07 to 0.41, while the transformer-based model was not significantly affected.

CVFeb 27, 2022
Application of DatasetGAN in medical imaging: preliminary studies

Zong Fan, Varun Kelkar, Mark A. Anastasio et al.

Generative adversarial networks (GANs) have been widely investigated for many potential applications in medical imaging. DatasetGAN is a recently proposed framework based on modern GANs that can synthesize high-quality segmented images while requiring only a small set of annotated training images. The synthesized annotated images could be potentially employed for many medical imaging applications, where images with segmentation information are required. However, to the best of our knowledge, there are no published studies focusing on its applications to medical imaging. In this work, preliminary studies were conducted to investigate the utility of DatasetGAN in medical imaging. Three improvements were proposed to the original DatasetGAN framework, considering the unique characteristics of medical images. The synthesized segmented images by DatasetGAN were visually evaluated. The trained DatasetGAN was further analyzed by evaluating the performance of a pre-defined image segmentation technique, which was trained by the use of the synthesized datasets. The effectiveness, concerns, and potential usage of DatasetGAN were discussed.

LGFeb 10, 2022
STG-GAN: A spatiotemporal graph generative adversarial networks for short-term passenger flow prediction in urban rail transit systems

Jinlei Zhang, Hua Li, Lixing Yang et al.

Short-term passenger flow prediction is an important but challenging task for better managing urban rail transit (URT) systems. Some emerging deep learning models provide good insights to improve short-term prediction accuracy. However, there exist many complex spatiotemporal dependencies in URT systems. Most previous methods only consider the absolute error between ground truth and predictions as the optimization objective, which fails to account for spatial and temporal constraints on the predictions. Furthermore, a large number of existing prediction models introduce complex neural network layers to improve accuracy while ignoring their training efficiency and memory occupancy, decreasing the chances to be applied to the real world. To overcome these limitations, we propose a novel deep learning-based spatiotemporal graph generative adversarial network (STG-GAN) model with higher prediction accuracy, higher efficiency, and lower memory occupancy to predict short-term passenger flows of the URT network. Our model consists of two major parts, which are optimized in an adversarial learning manner: (1) a generator network including gated temporal conventional networks (TCN) and weight sharing graph convolution networks (GCN) to capture structural spatiotemporal dependencies and generate predictions with a relatively small computational burden; (2) a discriminator network including a spatial discriminator and a temporal discriminator to enhance the spatial and temporal constraints of the predictions. The STG-GAN is evaluated on two large-scale real-world datasets from Beijing Subway. A comparison with those of several state-of-the-art models illustrates its superiority and robustness. This study can provide critical experience in conducting short-term passenger flow predictions, especially from the perspective of real-world applications.

CVJan 3, 2022
Gaussian-Hermite Moment Invariants of General Multi-Channel Functions

Hanlin Mo, Hua Li, Guoying Zhao

With the development of data acquisition technology, large amounts of multi-channel data are collected and widely used in many fields. Most of them, such as RGB images and vector fields, can be expressed as different types of multi-channel functions. Feature extraction of multi-channel data for identifying interest patterns is a critical but challenging task. This paper focuses on constructing moment-based features of general multi-channel functions. Specifically, we define two transform models, rotation-affine transform and total rotation transform, to describe real deformations of multi-channel data. Then, we design a structural framework to generate Gaussian-Hermite moment invariants for these two transform models systematically. It is the first time that a unified framework has been proposed in the literature to construct orthogonal moment invariants of general multi-channel functions. Given a specific type of multi-channel data, we demonstrate how to utilize the new method to derive all possible invariants and eliminate dependences among them. We obtain independent sets of invariants with low orders and low degrees for RGB images, 2D vector fields and color volume data. Based on synthetic and real multi-channel data, we conduct extensive experiments to evaluate the stability and discriminability of these invariants and their robustness to noise. The results show that new moment invariants significantly outperform previous moment invariants of multi-channel data in RGB image classification and vortex detection in 2D vector fields.

IVJul 6, 2021
Impact of deep learning-based image super-resolution on binary signal detection

Xiaohui Zhang, Varun A. Kelkar, Jason Granstedt et al.

Deep learning-based image super-resolution (DL-SR) has shown great promise in medical imaging applications. To date, most of the proposed methods for DL-SR have only been assessed by use of traditional measures of image quality (IQ) that are commonly employed in the field of computer vision. However, the impact of these methods on objective measures of image quality that are relevant to medical imaging tasks remains largely unexplored. In this study, we investigate the impact of DL-SR methods on binary signal detection performance. Two popular DL-SR methods, the super-resolution convolutional neural network (SRCNN) and the super-resolution generative adversarial network (SRGAN), were trained by use of simulated medical image data. Binary signal-known-exactly with background-known-statistically (SKE/BKS) and signal-known-statistically with background-known-statistically (SKS/BKS) detection tasks were formulated. Numerical observers, which included a neural network-approximated ideal observer and common linear numerical observers, were employed to assess the impact of DL-SR on task performance. The impact of the complexity of the DL-SR network architectures on task-performance was quantified. In addition, the utility of DL-SR for improving the task-performance of sub-optimal observers was investigated. Our numerical experiments confirmed that, as expected, DL-SR could improve traditional measures of IQ. However, for many of the study designs considered, the DL-SR methods provided little or no improvement in task performance and could even degrade it. It was observed that DL-SR could improve the task-performance of sub-optimal observers under certain conditions. The presented study highlights the urgent need for the objective assessment of DL-SR methods and suggests avenues for improving their efficacy in medical imaging applications.

IVJun 27, 2021
Learning stochastic object models from medical imaging measurements by use of advanced ambient generative adversarial networks

Weimin Zhou, Sayantan Bhadra, Frank J. Brooks et al.

Purpose: To objectively assess new medical imaging technologies via computer-simulations, it is important to account for the variability in the ensemble of objects to be imaged. This source of variability can be described by stochastic object models (SOMs). It is generally desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system, but this task has remained challenging. Approach: A generative adversarial network (GAN)-based method that employs AmbientGANs with modern progressive or multiresolution training approaches is proposed. AmbientGANs established using the proposed training procedure are systematically validated in a controlled way using computer-simulated magnetic resonance imaging (MRI) data corresponding to a stylized imaging system. Emulated single-coil experimental MRI data are also employed to demonstrate the methods under less stylized conditions. Results: The proposed AmbientGAN method can generate clean images when the imaging measurements are contaminated by measurement noise. When the imaging measurement data are incomplete, the proposed AmbientGAN can reliably learn the distribution of the measurement components of the objects. Conclusions: Both visual examinations and quantitative analyses, including task-specific validations using the Hotelling observer, demonstrated that the proposed AmbientGAN method holds promise to establish realistic SOMs from imaging measurements.

IVJan 30, 2021
Advancing the AmbientGAN for learning stochastic object models

Weimin Zhou, Sayantan Bhadra, Frank J. Brooks et al.

Medical imaging systems are commonly assessed and optimized by use of objective-measures of image quality (IQ) that quantify the performance of an observer at specific tasks. Variation in the objects to-be-imaged is an important source of variability that can significantly limit observer performance. This object variability can be described by stochastic object models (SOMs). In order to establish SOMs that can accurately model realistic object variability, it is desirable to use experimental data. To achieve this, an augmented generative adversarial network (GAN) architecture called AmbientGAN has been developed and investigated. However, AmbientGANs cannot be immediately trained by use of advanced GAN training methods such as the progressive growing of GANs (ProGANs). Therefore, the ability of AmbientGANs to establish realistic object models is limited. To circumvent this, a progressively-growing AmbientGAN (ProAmGAN) has been proposed. However, ProAmGANs are designed for generating two-dimensional (2D) images while medical imaging modalities are commonly employed for imaging three-dimensional (3D) objects. Moreover, ProAmGANs that employ traditional generator architectures lack the ability to control specific image features such as fine-scale textures that are frequently considered when optimizing imaging systems. In this study, we address these limitations by proposing two advanced AmbientGAN architectures: 3D ProAmGANs and Style-AmbientGANs (StyAmGANs). Stylized numerical studies involving magnetic resonance (MR) imaging systems are conducted. The ability of 3D ProAmGANs to learn 3D SOMs from imaging measurements and the ability of StyAmGANs to control fine-scale textures of synthesized objects are demonstrated.

CGJan 21, 2021
Geometric Moment Invariants to Motion Blur

Hongxiang Hao., Hanlin Mo., Hua Li

In this paper, we focus on removing interference of motion blur by the derivation of motion blur invariants.Unlike earlier work, we don't restore any blurred image. Based on geometric moment and mathematical model of motion blur, we prove that geometric moments of blurred image and original image are linearly related. Depending on this property, we can analyse whether an existing moment-based feature is invariant to motion blur. Surprisingly, we find some geometric moment invariants are invariants to not only spatial transform but also motion blur. Meanwhile, we test invariance and robustness of these invariants using synthetic and real blur image datasets. And the results show these invariants outperform some widely used blur moment invariants and non-moment image features in image retrieval, classification and template matching.

CVDec 11, 2020
Superpixel Segmentation Based on Spatially Constrained Subspace Clustering

Hua Li, Yuheng Jia, Runmin Cong et al.

Superpixel segmentation aims at dividing the input image into some representative regions containing pixels with similar and consistent intrinsic properties, without any prior knowledge about the shape and size of each superpixel. In this paper, to alleviate the limitation of superpixel segmentation applied in practical industrial tasks that detailed boundaries are difficult to be kept, we regard each representative region with independent semantic information as a subspace, and correspondingly formulate superpixel segmentation as a subspace clustering problem to preserve more detailed content boundaries. We show that a simple integration of superpixel segmentation with the conventional subspace clustering does not effectively work due to the spatial correlation of the pixels within a superpixel, which may lead to boundary confusion and segmentation error when the correlation is ignored. Consequently, we devise a spatial regularization and propose a novel convex locality-constrained subspace clustering model that is able to constrain the spatial adjacent pixels with similar attributes to be clustered into a superpixel and generate the content-aware superpixels with more detailed boundaries. Finally, the proposed model is solved by an efficient alternating direction method of multipliers (ADMM) solver. Experiments on different standard datasets demonstrate that the proposed method achieves superior performance both quantitatively and qualitatively compared with some state-of-the-art methods.

CVDec 10, 2020
Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Bowen Cai, Huan Fu, Rongfei Jia et al.

Adapting semantic segmentation models to new domains is an important but challenging problem. Recently enlightening progress has been made, but the performance of existing methods are unsatisfactory on real datasets where the new target domain comprises of heterogeneous sub-domains (e.g., diverse weather characteristics). We point out that carefully reasoning about the multiple modalities in the target domain can improve the robustness of adaptation models. To this end, we propose a condition-guided adaptation framework that is empowered by a special attentive progressive adversarial training (APAT) mechanism and a novel self-training policy. The APAT strategy progressively performs condition-specific alignment and attentive global feature matching. The new self-training scheme exploits the adversarial ambivalences of easy and hard adaptation regions and the correlations among target sub-domains effectively. We evaluate our method (DCAA) on various adaptation scenarios where the target images vary in weather conditions. The comparisons against baselines and the state-of-the-art approaches demonstrate the superiority of DCAA over the competitors.

IVNov 7, 2020
Deeply-Supervised Density Regression for Automatic Cell Counting in Microscopy Images

Shenghua He, Kyaw Thu Minn, Lilianna Solnica-Krezel et al.

Accurately counting the number of cells in microscopy images is required in many medical diagnosis and biological studies. This task is tedious, time-consuming, and prone to subjective errors. However, designing automatic counting methods remains challenging due to low image contrast, complex background, large variance in cell shapes and counts, and significant cell occlusions in two-dimensional microscopy images. In this study, we proposed a new density regression-based method for automatically counting cells in microscopy images. The proposed method processes two innovations compared to other state-of-the-art density regression-based methods. First, the density regression model (DRM) is designed as a concatenated fully convolutional regression network (C-FCRN) to employ multi-scale image features for the estimation of cell density maps from given images. Second, auxiliary convolutional neural networks (AuxCNNs) are employed to assist in the training of intermediate layers of the designed C-FCRN to improve the DRM performance on unseen datasets. Experimental studies evaluated on four datasets demonstrate the superior performance of the proposed method.

CVOct 2, 2020
A Parallel Down-Up Fusion Network for Salient Object Detection in Optical Remote Sensing Images

Chongyi Li, Runmin Cong, Chunle Guo et al.

The diverse spatial resolutions, various object types, scales and orientations, and cluttered backgrounds in optical remote sensing images (RSIs) challenge the current salient object detection (SOD) approaches. It is commonly unsatisfactory to directly employ the SOD approaches designed for nature scene images (NSIs) to RSIs. In this paper, we propose a novel Parallel Down-up Fusion network (PDF-Net) for SOD in optical RSIs, which takes full advantage of the in-path low- and high-level features and cross-path multi-resolution features to distinguish diversely scaled salient objects and suppress the cluttered backgrounds. To be specific, keeping a key observation that the salient objects still are salient no matter the resolutions of images are in mind, the PDF-Net takes successive down-sampling to form five parallel paths and perceive scaled salient objects that are commonly existed in optical RSIs. Meanwhile, we adopt the dense connections to take advantage of both low- and high-level information in the same path and build up the relations of cross paths, which explicitly yield strong feature representations. At last, we fuse the multiple-resolution features in parallel paths to combine the benefits of the features with different resolutions, i.e., the high-resolution feature consisting of complete structure and clear details while the low-resolution features highlighting the scaled salient objects. Extensive experiments on the ORSSD dataset demonstrate that the proposed network is superior to the state-of-the-art approaches both qualitatively and quantitatively.

SPMay 31, 2020
Two-stage short-term wind power forecasting algorithm using different feature-learning models

Jiancheng Qin, Jin Yang, Ying Chen et al.

Two-stage ensemble-based forecasting methods have been studied extensively in the wind power forecasting field. However, deep learning-based wind power forecasting studies have not investigated two aspects. In the first stage, different learning structures considering multiple inputs and multiple outputs have not been discussed. In the second stage, the model extrapolation issue has not been investigated. Therefore, we develop four deep neural networks for the first stage to learn data features considering the input-and-output structure. We then explore the model extrapolation issue in the second stage using different modeling methods. Considering the overfitting issue, we propose a new moving window-based algorithm using a validation set in the first stage to update the training data in both stages with two different moving window processes.Experiments were conducted at three wind farms, and the results demonstrate that the model with single input multiple output structure obtains better forecasting accuracy compared to existing models. In addition, the ridge regression method results in a better ensemble model that can further improve forecasting accuracy compared to existing machine learning methods. Finally, the proposed two-stage forecasting algorithm can generate more accurate and stable results than existing algorithms.

SPMay 29, 2020
Approximating the Ideal Observer for joint signal detection and localization tasks by use of supervised learning methods

Weimin Zhou, Hua Li, Mark A. Anastasio

Medical imaging systems are commonly assessed and optimized by use of objective measures of image quality (IQ). The Ideal Observer (IO) performance has been advocated to provide a figure-of-merit for use in assessing and optimizing imaging systems because the IO sets an upper performance limit among all observers. When joint signal detection and localization tasks are considered, the IO that employs a modified generalized likelihood ratio test maximizes observer performance as characterized by the localization receiver operating characteristic (LROC) curve. Computations of likelihood ratios are analytically intractable in the majority of cases. Therefore, sampling-based methods that employ Markov-Chain Monte Carlo (MCMC) techniques have been developed to approximate the likelihood ratios. However, the applications of MCMC methods have been limited to relatively simple object models. Supervised learning-based methods that employ convolutional neural networks have been recently developed to approximate the IO for binary signal detection tasks. In this paper, the ability of supervised learning-based methods to approximate the IO for joint signal detection and localization tasks is explored. Both background-known-exactly and background-known-statistically signal detection and localization tasks are considered. The considered object models include a lumpy object model and a clustered lumpy model, and the considered measurement noise models include Laplacian noise, Gaussian noise, and mixed Poisson-Gaussian noise. The LROC curves produced by the supervised learning-based method are compared to those produced by the MCMC approach or analytical computation when feasible. The potential utility of the proposed method for computing objective measures of IQ for optimizing imaging system performance is explored.

IVMay 29, 2020
Learning stochastic object models from medical imaging measurements using Progressively-Growing AmbientGANs

Weimin Zhou, Sayantan Bhadra, Frank J. Brooks et al.

It has been advocated that medical imaging systems and reconstruction algorithms should be assessed and optimized by use of objective measures of image quality that quantify the performance of an observer at specific diagnostic tasks. One important source of variability that can significantly limit observer performance is variation in the objects to-be-imaged. This source of variability can be described by stochastic object models (SOMs). A SOM is a generative model that can be employed to establish an ensemble of to-be-imaged objects with prescribed statistical properties. In order to accurately model variations in anatomical structures and object textures, it is desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system. Deep generative neural networks, such as generative adversarial networks (GANs) hold great potential for this task. However, conventional GANs are typically trained by use of reconstructed images that are influenced by the effects of measurement noise and the reconstruction process. To circumvent this, an AmbientGAN has been proposed that augments a GAN with a measurement operator. However, the original AmbientGAN could not immediately benefit from modern training procedures, such as progressive growing, which limited its ability to be applied to realistically sized medical image data. To circumvent this, in this work, a new Progressive Growing AmbientGAN (ProAmGAN) strategy is developed for establishing SOMs from medical imaging measurements. Stylized numerical studies corresponding to common medical imaging modalities are conducted to demonstrate and validate the proposed method for establishing SOMs.

CVApr 21, 2020
Spatio-Temporal Dual Affine Differential Invariant for Skeleton-based Action Recognition

Qi Li, Hanlin Mo, Jinghan Zhao et al.

The dynamics of human skeletons have significant information for the task of action recognition. The similarity between trajectories of corresponding joints is an indicating feature of the same action, while this similarity may subject to some distortions that can be modeled as the combination of spatial and temporal affine transformations. In this work, we propose a novel feature called spatio-temporal dual affine differential invariant (STDADI). Furthermore, in order to improve the generalization ability of neural networks, a channel augmentation method is proposed. On the large scale action recognition dataset NTU-RGB+D, and its extended version NTU-RGB+D 120, it achieves remarkable improvements over previous state-of-the-art methods.

CVFeb 3, 2020
Learning Numerical Observers using Unsupervised Domain Adaptation

Shenghua He, Weimin Zhou, Hua Li et al.

Medical imaging systems are commonly assessed by use of objective image quality measures. Supervised deep learning methods have been investigated to implement numerical observers for task-based image quality assessment. However, labeling large amounts of experimental data to train deep neural networks is tedious, expensive, and prone to subjective errors. Computer-simulated image data can potentially be employed to circumvent these issues; however, it is often difficult to computationally model complicated anatomical structures, noise sources, and the response of real world imaging systems. Hence, simulated image data will generally possess physical and statistical differences from the experimental image data they seek to emulate. Within the context of machine learning, these differences between the sets of two images is referred to as domain shift. In this study, we propose and investigate the use of an adversarial domain adaptation method to mitigate the deleterious effects of domain shift between simulated and experimental image data for deep learning-based numerical observers (DL-NOs) that are trained on simulated images but applied to experimental ones. In the proposed method, a DL-NO will initially be trained on computer-simulated image data and subsequently adapted for use with experimental image data, without the need for any labeled experimental images. As a proof of concept, a binary signal detection task is considered. The success of this strategy as a function of the degree of domain shift present between the simulated and experimental image data is investigated.

IVJan 26, 2020
Progressively-Growing AmbientGANs For Learning Stochastic Object Models From Imaging Measurements

Weimin Zhou, Sayantan Bhadra, Frank J. Brooks et al.

The objective optimization of medical imaging systems requires full characterization of all sources of randomness in the measured data, which includes the variability within the ensemble of objects to-be-imaged. This can be accomplished by establishing a stochastic object model (SOM) that describes the variability in the class of objects to-be-imaged. Generative adversarial networks (GANs) can be potentially useful to establish SOMs because they hold great promise to learn generative models that describe the variability within an ensemble of training data. However, because medical imaging systems record imaging measurements that are noisy and indirect representations of object properties, GANs cannot be directly applied to establish stochastic models of objects to-be-imaged. To address this issue, an augmented GAN architecture named AmbientGAN was developed to establish SOMs from noisy and indirect measurement data. However, because the adversarial training can be unstable, the applicability of the AmbientGAN can be potentially limited. In this work, we propose a novel training strategy---Progressive Growing of AmbientGANs (ProAGAN)---to stabilize the training of AmbientGANs for establishing SOMs from noisy and indirect imaging measurements. An idealized magnetic resonance (MR) imaging system and clinical MR brain images are considered. The proposed methodology is evaluated by comparing signal detection performance computed by use of ProAGAN-generated synthetic images and images that depict the true object properties.

CVNov 19, 2019
Dual affine moment invariants

You Hao, Hanlin Mo, Qi Li et al.

Affine transformation is one of the most common transformations in nature, which is an important issue in the field of computer vision and shape analysis. And affine transformations often occur in both shape and color space simultaneously, which can be termed as Dual-Affine Transformation (DAT). In general, we should derive invariants of different data formats separately, such as 2D color images, 3D color objects, or even higher-dimensional data. To the best of our knowledge, there is no general framework to derive invariants for all of these data formats. In this paper, we propose a general framework to derive moment invariants under DAT for objects in M-dimensional space with N channels, which can be called dual-affine moment invariants (DAMI). Following this framework, we present the generating formula of DAMI under DAT for 3D color objects. Then, we instantiated a complete set of DAMI for 3D color objects with orders and degrees no greater than 4. Finally, we analyze the characteristic of these DAMI and conduct classification experiments to evaluate the stability and discriminability of them. The results prove that DAMI is robust for DAT. Our derivation framework can be applied to data in any dimension with any number of channels.

CVNov 13, 2019
Rotation Differential Invariants of Images Generated by Two Fundamental Differential Operators

Hanlin Mo, Hua Li

In this paper, we design two fundamental differential operators for the derivation of rotation differential invariants of images. Each differential invariant obtained by using the new method can be expressed as a homogeneous polynomial of image partial derivatives, which preserve their values when the image is rotated by arbitrary angles. We produce all possible instances of homogeneous invariants up to the given order and degree, and discuss the independence of them in detail. As far as we know, no previous papers have published so many explicit forms of high-order rotation differential invariants of images. In the experimental part, texture classification and image patch verification are carried out on popular real databases. These rotation differential invariants are used as image feature vector. We mainly evaluate the effects of various factors on the performance of them. The experimental results also validate that they have better performance than some commonly used image features in some cases.

SPMay 15, 2019
Approximating the Ideal Observer and Hotelling Observer for binary signal detection tasks by use of supervised learning methods

Weimin Zhou, Hua Li, Mark A. Anastasio

It is widely accepted that optimization of medical imaging system performance should be guided by task-based measures of image quality (IQ). Task-based measures of IQ quantify the ability of an observer to perform a specific task such as detection or estimation of a signal (e.g., a tumor). For binary signal detection tasks, the Bayesian Ideal Observer (IO) sets an upper limit of observer performance and has been advocated for use in optimizing medical imaging systems and data-acquisition designs. Except in special cases, determination of the IO test statistic is analytically intractable. Markov-chain Monte Carlo (MCMC) techniques can be employed to approximate IO detection performance, but their reported applications have been limited to relatively simple object models. In cases where the IO test statistic is difficult to compute, the Hotelling Observer (HO) can be employed. To compute the HO test statistic, potentially large covariance matrices must be accurately estimated and subsequently inverted, which can present computational challenges. This work investigates supervised learning-based methodologies for approximating the IO and HO test statistics. Convolutional neural networks (CNNs) and single-layer neural networks (SLNNs) are employed to approximate the IO and HO test statistics, respectively. Numerical simulations were conducted for both signal-known-exactly (SKE) and signal-known-statistically (SKS) signal detection tasks. The performances of the supervised learning methods are assessed via receiver operating characteristic (ROC) analysis and the results are compared to those produced by use of traditional numerical methods or analytical calculations when feasible. The potential advantages of the proposed supervised learning approaches for approximating the IO and HO test statistics are discussed.

CVMar 4, 2019
Automatic microscopic cell counting by use of deeply-supervised density regression model

Shenghua He, Kyaw Thu Minn, Lilianna Solnica-Krezel et al.

Accurately counting cells in microscopic images is important for medical diagnoses and biological studies, but manual cell counting is very tedious, time-consuming, and prone to subjective errors, and automatic counting can be less accurate than desired. To improve the accuracy of automatic cell counting, we propose here a novel method that employs deeply-supervised density regression. A fully convolutional neural network (FCNN) serves as the primary FCNN for density map regression. Innovatively, a set of auxiliary FCNNs are employed to provide additional supervision for learning the intermediate layers of the primary CNN to improve network performance. In addition, the primary CNN is designed as a concatenating framework to integrate multi-scale features through shortcut connections in the network, which improves the granularity of the features extracted from the intermediate CNN layers and further supports the final density map estimation.

CVMar 1, 2019
Automatic microscopic cell counting by use of unsupervised adversarial domain adaptation and supervised density regression

Shenghua He, Kyaw Thu Minn, Lilianna Solnica-Krezel et al.

Accurate cell counting in microscopic images is important for medical diagnoses and biological studies. However, manual cell counting is very time-consuming, tedious, and prone to subjective errors. We propose a new density regression-based method for automatic cell counting that reduces the need to manually annotate experimental images. A supervised learning-based density regression model (DRM) is trained with annotated synthetic images (the source domain) and their corresponding ground truth density maps. A domain adaptation model (DAM) is built to map experimental images (the target domain) to the feature space of the source domain. By use of the unsupervised learning-based DAM and supervised learning-based DRM, a cell density map of a given target image can be estimated, from which the number of cells can be counted. Results from experimental immunofluorescent microscopic images of human embryonic stem cells demonstrate the promising performance of the proposed counting method.

CHEM-PHFeb 13, 2019
Machine Learning Allows Calibration Models to Predict Trace Element Concentration in Soil with Generalized LIBS Spectra

Chen Sun, Ye Tian, Liang Gao et al.

Calibration models have been developed for determination of trace elements, silver for instance, in soil using laser-induced breakdown spectroscopy (LIBS). The major concern is the matrix effect. Although it affects the accuracy of LIBS measurements in a general way, the effect appears accentuated for soil because of large variation of chemical and physical properties among different soils. The purpose is to reduce its influence in such way an accurate and soil-independent calibration model can be constructed. At the same time, the developed model should efficiently reduce experimental fluctuations affecting measurement precision. A univariate model first reveals obvious influence of matrix effect and important experimental fluctuation. A multivariate model has been then developed. A key point is the introduction of generalized spectrum where variables representing the soil type are explicitly included. Machine learning has been used to develop the model. After a necessary pretreatment where a feature selection process reduces the dimension of raw spectrum accordingly to the number of available spectra, the data have been fed in to a back-propagation neuronal networks (BPNN) to train and validate the model. The resulted soilindependent calibration model allows average relative error of calibration (REC) and average relative error of prediction (REP) within the range of 5-6%.