NAMay 18, 2017
Tensor absolute value equationsShouqiang Du, Liping Zhang, Chiyu Chen et al.
This paper is concerned with solving some structured multi-linear systems, which are called tensor absolute value equations. This kind of absolute value equations is closely related to tensor complementarity problems and is a generalization of the well-known absolute value equations in the matrix case. We prove that tensor absolute value equations are equivalent to some special structured tensor complementary problems. Some sufficient conditions are given to guarantee the existence of solutions for tensor absolute value equations. We also propose a Levenberg-Marquardt-type algorithm for solving some given tensor absolute value equations and preliminary numerical results are reported to indicate the efficiency of the proposed algorithm.
NAOct 28, 2011
The Dominant Eigenvalue of an Essentially Nonnegative TensorLiping Zhang, Liqun Qi, Ziyan Luo
It is well known that the dominant eigenvalue of a real essentially nonnegative matrix is a convex function of its diagonal entries. This convexity is of practical importance in population biology, graph theory, demography, analytic hierarchy process and so on. In this paper, the concept of essentially nonnegativity is extended from matrices to higher order tensors, and the convexity and log convexity of dominant eigenvalues for such a class of tensors are established. Particularly, for any nonnegative tensor, the spectral radius turns out to be the dominant eigenvalue and hence possesses these convexities. Finally, an algorithm is given to calculate the dominant eigenvalue, and numerical results are reported to show the effectiveness of the proposed algorithm.
CVNov 5, 2023
Deep Learning-based 3D Point Cloud Classification: A Systematic Survey and OutlookHuang Zhang, Changshuo Wang, Shengwei Tian et al.
In recent years, point cloud representation has become one of the research hotspots in the field of computer vision, and has been widely used in many fields, such as autonomous driving, virtual reality, robotics, etc. Although deep learning techniques have achieved great success in processing regular structured 2D grid image data, there are still great challenges in processing irregular, unstructured point cloud data. Point cloud classification is the basis of point cloud analysis, and many deep learning-based methods have been widely used in this task. Therefore, the purpose of this paper is to provide researchers in this field with the latest research progress and future trends. First, we introduce point cloud acquisition, characteristics, and challenges. Second, we review 3D data representations, storage formats, and commonly used datasets for point cloud classification. We then summarize deep learning-based methods for point cloud classification and complement recent research work. Next, we compare and analyze the performance of the main methods. Finally, we discuss some challenges and future directions for point cloud classification.
NAFeb 29, 2012
M-tensors and The Positive Definiteness of a Multivariate FormLiping Zhang, Liqun Qi, Guanglu Zhou
We study M-tensors and various properties of M-tensors are given. Specially, we show that the smallest real eigenvalue of M-tensor is positive corresponding to a nonnegative eigenvector. We propose an algorithm to find the smallest positive eigenvalue and then apply the property to study the positive definiteness of a multivariate form.
PFApr 28Code
PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance PredictionKaixuan Zhang, Yunfan Cui, Shuhao Zhang et al.
The rapid expansion of Transformer-based large language models has dramatically increased the need for high-performance GPUs. As a result, there is growing demand for fast, accurate, and widely generalizable GPU performance models to support next-generation hardware selection and system-level exploration. However, current data-driven methods are limited, exhibiting poor generalization across hardware and inadequate modeling of complex production-level kernels common in modern inference stacks. To address these issues, we present PipeWeave, a unified GPU modeling framework. This approach first employs an analytical model to quantify a given kernel's demands on the GPU's heterogeneous instruction pipelines. These analytical features are then fed into a machine learning (ML) model to capture complex cross-pipeline interactions and resource dependencies, enabling high-fidelity performance prediction. Our evaluation across 11 GPU types from four generations of major architectures on two widely-used serving systems demonstrates that PipeWeave delivers high fidelity and strong generalizability. It achieves accurate predictions, with only 6.1% average error at the kernel level and 8.5% for end-to-end inference -- reducing the error of state-of-the-art methods by 6.7x and 4.4x, respectively. We also demonstrate PipeWeave's value "beyond simulation" by utilizing its performance ceiling to diagnose implementation shortcomings and guide the optimization of a production fused MoE Triton kernel, achieving up to 1.7x speedup. Code is available https://github.com/zksainx/pipeweave.
IVAug 3, 2023
CartiMorph: a framework for automated knee articular cartilage morphometricsYongcheng Yao, Junru Zhong, Liping Zhang et al.
We introduce CartiMorph, a framework for automated knee articular cartilage morphometrics. It takes an image as input and generates quantitative metrics for cartilage subregions, including the percentage of full-thickness cartilage loss (FCL), mean thickness, surface area, and volume. CartiMorph leverages the power of deep learning models for hierarchical image feature representation. Deep learning models were trained and validated for tissue segmentation, template construction, and template-to-image registration. We established methods for surface-normal-based cartilage thickness mapping, FCL estimation, and rule-based cartilage parcellation. Our cartilage thickness map showed less error in thin and peripheral regions. We evaluated the effectiveness of the adopted segmentation model by comparing the quantitative metrics obtained from model segmentation and those from manual segmentation. The root-mean-squared deviation of the FCL measurements was less than 8%, and strong correlations were observed for the mean thickness (Pearson's correlation coefficient $ρ\in [0.82,0.97]$), surface area ($ρ\in [0.82,0.98]$) and volume ($ρ\in [0.89,0.98]$) measurements. We compared our FCL measurements with those from a previous study and found that our measurements deviated less from the ground truths. We observed superior performance of the proposed rule-based cartilage parcellation method compared with the atlas-based approach. CartiMorph has the potential to promote imaging biomarkers discovery for knee osteoarthritis.
LGJun 19, 2023
An Error Correction Mid-term Electricity Load Forecasting Model Based on Seasonal DecompositionLiping Zhang, Di Wu, Xin Luo
Mid-term electricity load forecasting (LF) plays a critical role in power system planning and operation. To address the issue of error accumulation and transfer during the operation of existing LF models, a novel model called error correction based LF (ECLF) is proposed in this paper, which is designed to provide more accurate and stable LF. Firstly, time series analysis and feature engineering act on the original data to decompose load data into three components and extract relevant features. Then, based on the idea of stacking ensemble, long short-term memory is employed as an error correction module to forecast the components separately, and the forecast results are treated as new features to be fed into extreme gradient boosting for the second-step forecasting. Finally, the component sub-series forecast results are reconstructed to obtain the final LF results. The proposed model is evaluated on real-world electricity load data from two cities in China, and the experimental results demonstrate its superior performance compared to the other benchmark models.
IVJun 20, 2023
CAMP-Net: Consistency-Aware Multi-Prior Network for Accelerated MRI ReconstructionLiping Zhang, Xiaobo Li, Weitian Chen
Undersampling k-space data in MRI reduces scan time but pose challenges in image reconstruction. Considerable progress has been made in reconstructing accelerated MRI. However, restoration of high-frequency image details in highly undersampled data remains challenging. To address this issue, we propose CAMP-Net, an unrolling-based Consistency-Aware Multi-Prior Network for accelerated MRI reconstruction. CAMP-Net leverages complementary multi-prior knowledge and multi-slice information from various domains to enhance reconstruction quality. Specifically, CAMP-Net comprises three interleaved modules for image enhancement, k-space restoration, and calibration consistency, respectively. These modules jointly learn priors from data in image domain, k-domain, and calibration region, respectively, in data-driven manner during each unrolled iteration. Notably, the encoded calibration prior knowledge extracted from auto-calibrating signals implicitly guides the learning of consistency-aware k-space correlation for reliable interpolation of missing k-space data. To maximize the benefits of image domain and k-domain prior knowledge, the reconstructions are aggregated in a frequency fusion module, exploiting their complementary properties to optimize the trade-off between artifact removal and fine detail preservation. Additionally, we incorporate a surface data fidelity layer during the learning of k-domain and calibration domain priors to prevent degradation of the reconstruction caused by padding-induced data imperfections. We evaluate the generalizability and robustness of our method on three large public datasets with varying acceleration factors and sampling patterns. The experimental results demonstrate that our method outperforms state-of-the-art approaches in terms of both reconstruction quality and $T_2$ mapping estimation, particularly in scenarios with high acceleration factors.
IVMar 23, 2023
Confidence-Driven Deep Learning Framework for Early Detection of Knee OsteoarthritisZhe Wang, Aladine Chetouani, Yung Hsin Chen et al.
Knee Osteoarthritis (KOA) is a prevalent musculoskeletal disorder that severely impacts mobility and quality of life, particularly among older adults. Its diagnosis often relies on subjective assessments using the Kellgren-Lawrence (KL) grading system, leading to variability in clinical evaluations. To address these challenges, we propose a confidence-driven deep learning framework for early KOA detection, focusing on distinguishing KL-0 and KL-2 stages. The Siamese-based framework integrates a novel multi-level feature extraction architecture with a hybrid loss strategy. Specifically, multi-level Global Average Pooling (GAP) layers are employed to extract features from varying network depths, ensuring comprehensive feature representation, while the hybrid loss strategy partitions training samples into high-, medium-, and low-confidence subsets. Tailored loss functions are applied to improve model robustness and effectively handle uncertainty in annotations. Experimental results on the Osteoarthritis Initiative (OAI) dataset demonstrate that the proposed framework achieves competitive accuracy, sensitivity, and specificity, comparable to those of expert radiologists. Cohen's kappa values (k > 0.85)) confirm substantial agreement, while McNemar's test (p > 0.05) indicates no statistically significant differences between the model and radiologists. Additionally, Confidence distribution analysis reveals that the model emulates radiologists' decision-making patterns. These findings highlight the potential of the proposed approach to serve as an auxiliary diagnostic tool, enhancing early KOA detection and reducing clinical workload.
NAAug 8, 2018
Randomized Core Reduction for Discrete Ill-Posed ProblemLiping Zhang, Yimin Wei
In this paper, we apply randomized algorithms to approximate the total least squares (TLS) solution of the problem $Ax\approx b$ in the large-scale discrete ill-posed problems. A regularization technique, based on the multiplicative randomization and the subspace iteration, is proposed to obtain the approximate core problem.In the error analysis, we provide upper bounds %in terms of the $(k\!\!+\!\!1)$-th singular value of $A$ for the errors of the solution and the residual of the randomized core reduction. Illustrative numerical examples and comparisons are presented.
SYApr 12
On the Optimization Landscape of Observer-based Dynamic Linear Quadratic ControlJingliang Duan, Jie Li, Yinsong Ma et al.
Understanding the optimization landscape of linear quadratic regulation (LQR) problems is fundamental to the design of efficient reinforcement learning solutions. Recent work has made significant progress in characterizing the landscape of static output-feedback control and linear quadratic Gaussian (LQG) control. For LQG, much of the analysis leverages the separation principle, which allows the controller and estimator to be designed independently. However, this simplification breaks down when the gradients with respect to the estimator and controller parameters are inherently coupled, leading to a more intricate analysis. This paper investigates the optimization landscape of observer-based dynamic output-feedback control of LQR problems. We derive the optimal observer-controller pair in settings where transient quadratic performance cannot be neglected. Our analysis reveals that, in general, the combination of the standard LQR controller and the observer that minimizes the trace of the accumulated estimation error covariance does not correspond to a stationary point of the overall closed-loop performance objective. Moreover, we derive a pair of discrete-time Sylvester equations with symmetric structure, both involving the same set of matrix elements, that characterize the stationary point of the observer-based dynamic LQR problem. These equations offer analytical insight into the structure of the optimality conditions and provide a foundation for developing numerical policy gradient methods aimed at learning complex controllers that rely on reconstructed state information.
DCJul 2, 2024
SwiftDiffusion: Efficient Diffusion Model Serving with Add-on ModulesSuyi Li, Lingyun Yang, Xiaoxiao Jiang et al.
Text-to-image (T2I) generation using diffusion models has become a blockbuster service in today's AI cloud. A production T2I service typically involves a serving workflow where a base diffusion model is augmented with various "add-on" modules, notably ControlNet and LoRA, to enhance image generation control. Compared to serving the base model alone, these add-on modules introduce significant loading and computational overhead, resulting in increased latency. In this paper, we present SwiftDiffusion, a system that efficiently serves a T2I workflow through a holistic approach. SwiftDiffusion decouples ControNet from the base model and deploys it as a separate, independently scaled service on dedicated GPUs, enabling ControlNet caching, parallelization, and sharing. To mitigate the high loading overhead of LoRA serving, SwiftDiffusion employs a bounded asynchronous LoRA loading (BAL) technique, allowing LoRA loading to overlap with the initial base model execution by up to k steps without compromising image quality. Furthermore, SwiftDiffusion optimizes base model execution with a novel latent parallelism technique. Collectively, these designs enable SwiftDiffusion to outperform the state-of-the-art T2I serving systems, achieving up to 7.8x latency reduction and 1.6x throughput improvement in serving SDXL models on H800 GPUs, without sacrificing image quality.
LGFeb 2
Dissecting Outlier Dynamics in LLM NVFP4 PretrainingPeijie Dong, Ruibo Fan, Yuechen Tao et al.
Training large language models using 4-bit arithmetic enhances throughput and memory efficiency. Yet, the limited dynamic range of FP4 increases sensitivity to outliers. While NVFP4 mitigates quantization error via hierarchical microscaling, a persistent loss gap remains compared to BF16. This study conducts a longitudinal analysis of outlier dynamics across architecture during NVFP4 pretraining, focusing on where they localize, why they occur, and how they evolve temporally. We find that, compared with Softmax Attention (SA), Linear Attention (LA) reduces per-tensor heavy tails but still exhibits persistent block-level spikes under block quantization. Our analysis attributes outliers to specific architectural components: Softmax in SA, gating in LA, and SwiGLU in FFN, with "post-QK" operations exhibiting higher sensitivity to quantization. Notably, outliers evolve from transient spikes early in training to a small set of persistent hot channels (i.e., channels with persistently large magnitudes) in later stages. Based on these findings, we introduce Hot-Channel Patch (HCP), an online compensation mechanism that identifies hot channels and reinjects residuals using hardware-efficient kernels. We then develop CHON, an NVFP4 training recipe integrating HCP with post-QK operation protection. On GLA-1.3B model trained for 60B tokens, CHON reduces the loss gap to BF16 from 0.94% to 0.58% while maintaining downstream accuracy.
PFApr 11
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuningKaixuan Zhang, Chutong Ding, Shiyou Qian et al.
The rapid adoption of Large Language Models (LLMs) has made GPU inference efficiency an increasingly critical system concern. The runtime of LLM workloads is largely dominated by tile-based kernels, particularly General Matrix Multiplications (GEMMs). Although these kernels are highly optimized, their performance remains sensitive to a large space of runtime parameters, such as tile sizes and pipeline stages. The interaction between these parameters and hardware resources leads to a non-convex optimization landscape. Existing approaches to parameter configuration -- including search-based auto-tuning, heuristic rules, and learned cost models -- face a fundamental trade-off between performance optimality and runtime efficiency. In this paper, we present WaveTune, a wave-aware framework for runtime kernel auto-tuning. First, we introduce a unified mapping method to handle input diversity and decompose the configuration space to manage high dimensionality. Second, we develop an analytical wave-aware bilinear model that accurately predicts kernel latency. Third, we design a sparse sampling scheme based on wave structures and a lightweight dual-table retrieval mechanism to minimize runtime overhead. As a result, WaveTune enables precise and efficient runtime configuration for GPU kernels. Across three representative kernels and five GPU architectures, WaveTune consistently achieves near-optimal kernel performance, delivering up to 1.83x kernel-level speedup and up to 1.33x end-to-end TTFT reduction, while reducing runtime decision overhead by five orders of magnitude compared to exhaustive search. These results demonstrate that WaveTune effectively eliminates the traditional trade-off between configuration latency and execution optimality, providing a practical and robust solution for high-performance LLM inference.
LGNov 6, 2025
Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement LearningJiaming Zhang, Yujie Yang, Haoning Wang et al.
Safe reinforcement learning (safe RL) aims to respect safety requirements while optimizing long-term performance. In many practical applications, however, the problem involves an infinite number of constraints, known as semi-infinite safe RL (SI-safe RL). Such constraints typically appear when safety conditions must be enforced across an entire continuous parameter space, such as ensuring adequate resource distribution at every spatial location. In this paper, we propose exchange policy optimization (EPO), an algorithmic framework that achieves optimal policy performance and deterministic bounded safety. EPO works by iteratively solving safe RL subproblems with finite constraint sets and adaptively adjusting the active set through constraint expansion and deletion. At each iteration, constraints with violations exceeding the predefined tolerance are added to refine the policy, while those with zero Lagrange multipliers are removed after the policy update. This exchange rule prevents uncontrolled growth of the working set and supports effective policy training. Our theoretical analysis demonstrates that, under mild assumptions, strategies trained via EPO achieve performance comparable to optimal solutions with global constraint violations strictly remaining within a prescribed bound.
AIMay 11
GESR: A Genetic Programming-Based Symbolic Regression Method with Gene EditingYanjie Li, Liping Zhang, Min Wu et al.
Mathematical formulas serve as a language through which humans communicate with nature. Discovering mathematical laws from scientific data to describe natural phenomena has been a long-standing pursuit of humanity for centuries. In the field of artificial intelligence, this challenge is known as the symbolic regression problem. Among existing symbolic regression approaches, Genetic Programming (GP) based on evolutionary algorithms remains one of the most classical and widely adopted methods. GP simulates the evolutionary process across generations through genetic mutation and crossover. However, mutations and crossovers in GP are entirely random. While this randomness effectively mimics natural evolution, it inevitably produces both beneficial and detrimental variations. If there existed a metaphorical `God` capable of foreseeing which genetic mutations or crossovers would yield superior outcomes and performing targeted gene editing accordingly, the efficiency of evolution could be substantially improved. Motivated by this idea, we propose in this paper a symbolic regression approach based on gene editing, termed GESR. In GESR, we trained two "hands of God" (two BERT models). Among them, the first leverages the BERT's masked language modeling capability to guide the mutation of genes (expression symbols). The other BERT model guides the crossover of individual genes by predicting the crossover point. Experimental results demonstrate that GESR significantly improves computational efficiency compared with traditional GP algorithms and achieves strong overall performance across multiple symbolic regression tasks.
IVFeb 26, 2023
Key-Exchange Convolutional Auto-Encoder for Data Augmentation in Early Knee Osteoarthritis DetectionZhe Wang, Aladine Chetouani, Mohamed Jarraya et al.
Knee Osteoarthritis (KOA) is a common musculoskeletal condition that significantly affects mobility and quality of life, particularly in elderly populations. However, training deep learning models for early KOA classification is often hampered by the limited availability of annotated medical datasets, owing to the high costs and labour-intensive nature of data labelling. Traditional data augmentation techniques, while useful, rely on simple transformations and fail to introduce sufficient diversity into the dataset. To address these challenges, we propose the Key-Exchange Convolutional Auto-Encoder (KECAE) as an innovative Artificial Intelligence (AI)-based data augmentation strategy for early KOA classification. Our model employs a convolutional autoencoder with a novel key-exchange mechanism that generates synthetic images by selectively exchanging key pathological features between X-ray images, which not only diversifies the dataset but also ensures the clinical validity of the augmented data. A hybrid loss function is introduced to supervise feature learning and reconstruction, integrating multiple components, including reconstruction, supervision, and feature separation losses. Experimental results demonstrate that the KECAE-generated data significantly improve the performance of KOA classification models, with accuracy gains of up to 1.98% across various standard and state-of-the-art architectures. Furthermore, a clinical validation study involving expert radiologists confirms the anatomical plausibility and diagnostic realism of the synthetic outputs. These findings highlight the potential of KECAE as a robust tool for augmenting medical datasets in early KOA detection.
IVOct 17, 2023
$k$-$t$ CLAIR: Self-Consistency Guided Multi-Prior Learning for Dynamic Parallel MR Image ReconstructionLiping Zhang, Weitian Chen
Cardiac magnetic resonance imaging (CMR) has been widely used in clinical practice for the medical diagnosis of cardiac diseases. However, the long acquisition time hinders its development in real-time applications. Here, we propose a novel self-consistency guided multi-prior learning framework named $k$-$t$ CLAIR to exploit spatiotemporal correlations from highly undersampled data for accelerated dynamic parallel MRI reconstruction. The $k$-$t$ CLAIR progressively reconstructs faithful images by leveraging multiple complementary priors learned in the $x$-$t$, $x$-$f$, and $k$-$t$ domains in an iterative fashion, as dynamic MRI exhibits high spatiotemporal redundancy. Additionally, $k$-$t$ CLAIR incorporates calibration information for prior learning, resulting in a more consistent reconstruction. Experimental results on cardiac cine and T1W/T2W images demonstrate that $k$-$t$ CLAIR achieves high-quality dynamic MR reconstruction in terms of both quantitative and qualitative performance.
IVJun 18, 2025Code
Diffusion-based Counterfactual Augmentation: Towards Robust and Interpretable Knee Osteoarthritis GradingZhe Wang, Yuhua Ru, Aladine Chetouani et al.
Automated grading of Knee Osteoarthritis (KOA) from radiographs is challenged by significant inter-observer variability and the limited robustness of deep learning models, particularly near critical decision boundaries. To address these limitations, this paper proposes a novel framework, Diffusion-based Counterfactual Augmentation (DCA), which enhances model robustness and interpretability by generating targeted counterfactual examples. The method navigates the latent space of a diffusion model using a Stochastic Differential Equation (SDE), governed by balancing a classifier-informed boundary drive with a manifold constraint. The resulting counterfactuals are then used within a self-corrective learning strategy to improve the classifier by focusing on its specific areas of uncertainty. Extensive experiments on the public Osteoarthritis Initiative (OAI) and Multicenter Osteoarthritis Study (MOST) datasets demonstrate that this approach significantly improves classification accuracy across multiple model architectures. Furthermore, the method provides interpretability by visualizing minimal pathological changes and revealing that the learned latent space topology aligns with clinical knowledge of KOA progression. The DCA framework effectively converts model uncertainty into a robust training signal, offering a promising pathway to developing more accurate and trustworthy automated diagnostic systems. Our code is available at https://github.com/ZWang78/DCA.
IVApr 9, 2025Code
MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-ResolutionZhe Wang, Yuhua Ru, Aladine Chetouani et al.
Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diffusion-based SR models that apply a uniform denoising process across the entire image, MoEDiff-SR dynamically selects specialized denoising experts at a fine-grained token level, ensuring region-specific adaptation and enhanced SR performance. Specifically, our approach first employs a Transformer-based feature extractor to compute multi-scale patch embeddings, capturing both global structural information and local texture details. The extracted feature embeddings are then fed into an MoE gating network, which assigns adaptive weights to multiple diffusion-based denoisers, each specializing in different brain MRI characteristics, such as centrum semiovale, sulcal and gyral cortex, and grey-white matter junction. The final output is produced by aggregating the denoised results from these specialized experts according to dynamically assigned gating probabilities. Experimental results demonstrate that MoEDiff-SR outperforms existing state-of-the-art methods in terms of quantitative image quality metrics, perceptual fidelity, and computational efficiency. Difference maps from each expert further highlight their distinct specializations, confirming the effective region-specific denoising capability and the interpretability of expert contributions. Additionally, clinical evaluation validates its superior diagnostic capability in identifying subtle pathological features, emphasizing its practical relevance in clinical neuroimaging. Our code is available at https://github.com/ZWang78/MoEDiff-SR.
IVJan 30, 2025Code
Distillation-Driven Diffusion Model for Multi-Scale MRI Super-Resolution: Make 1.5T MRI Great AgainZhe Wang, Yuhua Ru, Fabian Bauer et al.
Magnetic Resonance Imaging (MRI) offers critical insights into microstructural details, however, the spatial resolution of standard 1.5T imaging systems is often limited. In contrast, 7T MRI provides significantly enhanced spatial resolution, enabling finer visualization of anatomical structures. Though this, the high cost and limited availability of 7T MRI hinder its widespread use in clinical settings. To address this challenge, a novel Super-Resolution (SR) model is proposed to generate 7T-like MRI from standard 1.5T MRI scans. Our approach leverages a diffusion-based architecture, incorporating gradient nonlinearity correction and bias field correction data from 7T imaging as guidance. Moreover, to improve deployability, a progressive distillation strategy is introduced. Specifically, the student model refines the 7T SR task with steps, leveraging feature maps from the inference phase of the teacher model as guidance, aiming to allow the student model to achieve progressively 7T SR performance with a smaller, deployable model size. Experimental results demonstrate that our baseline teacher model achieves state-of-the-art SR performance. The student model, while lightweight, sacrifices minimal performance. Furthermore, the student model is capable of accepting MRI inputs at varying resolutions without the need for retraining, significantly further enhancing deployment flexibility. The clinical relevance of our proposed method is validated using clinical data from Massachusetts General Hospital. Our code is available at https://github.com/ZWang78/SR.
SEFeb 12, 2015Code
A Novel Approach for Clone Group Mapping by using Topic ModelingRuixia Zhang, Liping Zhang, Huan Wang et al.
Clone group mapping has a very important significance in the evolution of code clone. The topic modeling techniques were applied into code clone firstly and a new clone group mapping method was proposed. The method is very effective for not only Type-1 and Type-2 clone but also Type-3 clone .By making full use of the source text and structure information, topic modeling techniques transform the mapping problem of high-dimensional code space into a low-dimensional topic space, the goal of clone group mapping was indirectly reached by mapping clone group topics. Experiments on four open source software show that the recall and precision are up to 0.99, thus the method can effectively and accurately reach the goal of clone group mapping.
CVMar 16
KGS-GCN: Enhancing Sparse Skeleton Sensing via Kinematics-Driven Gaussian Splatting and Probabilistic Topology for Action RecognitionYuhan Chen, Yicui Shi, Guofa Li et al.
Skeleton-based action recognition is widely utilized in sensor systems including human-computer interaction and intelligent surveillance. Nevertheless, current sensor devices typically generate sparse skeleton data as discrete coordinates, which inevitably discards fine-grained spatiotemporal details during highly dynamic movements. Moreover, the rigid constraints of predefined physical sensor topologies hinder the modeling of latent long-range dependencies. To overcome these limitations, we propose KGS-GCN, a graph convolutional network that integrates kinematics-driven Gaussian splatting with probabilistic topology. Our framework explicitly addresses the challenges of sensor data sparsity and topological rigidity by transforming discrete joints into continuous generative representations. Firstly, a kinematics-driven Gaussian splatting module is designed to dynamically construct anisotropic covariance matrices using instantaneous joint velocity vectors. This module enhances visual representation by rendering sparse skeleton sequences into multi-view continuous heatmaps rich in spatiotemporal semantics. Secondly, to transcend the limitations of fixed physical connections, a probabilistic topology construction method is proposed. This approach generates an adaptive prior adjacency matrix by quantifying statistical correlations via the Bhattacharyya distance between joint Gaussian distributions. Ultimately, the GCN backbone is adaptively modulated by the rendered visual features via a visual context gating mechanism. Empirical results demonstrate that KGS-GCN significantly enhances the modeling of complex spatiotemporal dynamics. By addressing the inherent limitations of sparse inputs, our framework offers a robust solution for processing low-fidelity sensor data. This approach establishes a practical pathway for improving perceptual reliability in real-world sensing applications.
LGNov 13, 2025
EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM TrainingQingao Yi, Jiaang Duan, Hanwen Hu et al.
Training large language models (LLMs) poses significant challenges regarding computational resources and memory capacity. Although distributed training techniques help mitigate these issues, they still suffer from considerable communication overhead. Existing approaches primarily rely on static gradient compression to enhance communication efficiency; however, these methods neglect the dynamic nature of evolving gradients during training, leading to performance degradation. Accelerating LLM training via compression without sacrificing performance remains a challenge. In this paper, we propose an entropy-driven dynamic gradient compression framework called EDGC. The core concept is to adjust the compression rate during LLM training based on the evolving trends of gradient entropy, taking into account both compression efficiency and error. EDGC consists of three key components.First, it employs a down-sampling method to efficiently estimate gradient entropy, reducing computation overhead. Second, it establishes a theoretical model linking compression rate with gradient entropy, enabling more informed compression decisions. Lastly, a window-based adjustment mechanism dynamically adapts the compression rate across pipeline stages, improving communication efficiency and maintaining model performance. We implemented EDGC on a 32-NVIDIA-V100 cluster and a 64-NVIDIA-H100 cluster to train GPT2-2.5B and GPT2-12.1B, respectively. The results show that EDGC significantly reduces communication latency and training time by up to 46.45% and 16.13% while preserving LLM accuracy.
LGMay 1
Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement LearningJiaming Zhang, Yujie Yang, Yao Lyu et al.
Safety is a primary challenge in real-world reinforcement learning (RL). Formulating safety requirements as state-wise constraints has become a prominent paradigm. Handling state-wise constraints with the Lagrangian method requires a distinct multiplier for every state, necessitating neural networks to approximate them as a multiplier network. However, applying standard dual gradient ascent to multiplier networks induces severe training oscillations. This is because the inherent instability of dual ascent is exacerbated by network generalization -- local overshoots and delayed updates propagate to adjacent states, further amplifying policy fluctuations. Existing stabilization techniques are designed for scalar multipliers, which are inadequate for state-dependent multiplier networks. To address this challenge, we propose an augmented Lagrangian multiplier network (ALaM) framework for stable learning of state-wise multipliers. ALaM consists of two key components. First, a quadratic penalty is introduced into the augmented Lagrangian to compensate for delayed multiplier updates and establish the local convexity near the optimum, thereby mitigating policy oscillations. Second, the multiplier network is trained via supervised regression toward a dual target, which stabilizes training and promotes convergence. Theoretically, we show that ALaM guarantees multiplier convergence and thus recovers the optimal policy of the constrained problem. Building on this framework, we integrate soft actor-critic (SAC) with ALaM to develop the SAC-ALaM algorithm. Experiments demonstrate that SAC-ALaM outperforms state-of-the-art safe RL baselines in both safety and return, while also stabilizing training dynamics and learning well-calibrated multipliers for risk identification.
LGJan 25, 2025
Predictive Lagrangian Optimization for Constrained Reinforcement LearningTianqi Zhang, Puzhen Yuan, Guojian Zhan et al.
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal evolution of a feedback control system. Classical constrained optimization methods, such as penalty and Lagrangian approaches, inherently use proportional and integral feedback controllers. In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system, for the purpose of developing more effective constrained RL algorithms. Firstly, we define that each step of the system evolution determines the Lagrange multiplier by solving a multiplier feedback optimal control problem (MFOCP). In this problem, the control input is multiplier, the state is policy parameters, the dynamics is described by policy gradient descent, and the objective is to minimize constraint violations. Then, we introduce a multiplier guided policy learning (MGPL) module to perform policy parameters updating. And we prove that the resulting optimal policy, achieved through alternating MFOCP and MGPL, aligns with the solution of the primal constrained RL problem, thereby establishing our equivalence framework. Furthermore, we point out that the existing PID Lagrangian is merely one special case within our framework that utilizes a PID controller. We also accommodate the integration of other various feedback controllers, thereby facilitating the development of new algorithms. As a representative, we employ model predictive control (MPC) as the feedback controller and consequently propose a new algorithm called predictive Lagrangian optimization (PLO). Numerical experiments demonstrate its superiority over the PID Lagrangian method, achieving a larger feasible region up to 7.2% and a comparable average reward.
AIJan 3, 2024
A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable StructureYanjie Li, Weijun Li, Lina Yu et al.
Multilayer perception (MLP) has permeated various disciplinary domains, ranging from bioinformatics to financial analytics, where their application has become an indispensable facet of contemporary scientific research endeavors. However, MLP has obvious drawbacks. 1), The type of activation function is single and relatively fixed, which leads to poor `representation ability' of the network, and it is often to solve simple problems with complex networks; 2), the network structure is not adaptive, it is easy to cause network structure redundant or insufficient. In this work, we propose a novel neural network paradigm X-Net promising to replace MLPs. X-Net can dynamically learn activation functions individually based on derivative information during training to improve the network's representational ability for specific tasks. At the same time, X-Net can precisely adjust the network structure at the neuron level to accommodate tasks of varying complexity and reduce computational costs. We show that X-Net outperforms MLPs in terms of representational capability. X-Net can achieve comparable or even better performance than MLP with much smaller parameters on regression and classification tasks. Specifically, in terms of the number of parameters, X-Net is only 3% of MLP on average and only 1.1% under some tasks. We also demonstrate X-Net's ability to perform scientific discovery on data from various disciplines such as energy, environment, and aerospace, where X-Net is shown to help scientists discover new laws of mathematics or physics.
DCApr 9
LegoDiffusion: Micro-Serving Text-to-Image Diffusion WorkflowsLingyun Yang, Suyi Li, Tianyu Feng et al.
Text-to-image generation executes a diffusion workflow comprising multiple models centered on a base diffusion model. Existing serving systems treat each workflow as an opaque monolith, provisioning, placing, and scaling all constituent models together, which obscures internal dataflow, prevents model sharing, and enforces coarse-grained resource management. In this paper, we make a case for micro-serving diffusion workflows with LegoDiffusion, a system that decomposes a workflow into loosely coupled model-execution nodes that can be independently managed and scheduled. By explicitly managing individual model inference, LegoDiffusion unlocks cluster-scale optimizations, including per-model scaling, model sharing, and adaptive model parallelism. Collectively, LegoDiffusion outperforms existing diffusion workflow serving systems, sustaining up to 3x higher request rates and tolerating up to 8x higher burst traffic.
LGMar 25, 2024
A Novel Loss Function-based Support Vector Machine for Binary ClassificationYan Li, Liping Zhang
The previous support vector machine(SVM) including $0/1$ loss SVM, hinge loss SVM, ramp loss SVM, truncated pinball loss SVM, and others, overlooked the degree of penalty for the correctly classified samples within the margin. This oversight affects the generalization ability of the SVM classifier to some extent. To address this limitation, from the perspective of confidence margin, we propose a novel Slide loss function ($\ell_s$) to construct the support vector machine classifier($\ell_s$-SVM). By introducing the concept of proximal stationary point, and utilizing the property of Lipschitz continuity, we derive the first-order optimality conditions for $\ell_s$-SVM. Based on this, we define the $\ell_s$ support vectors and working set of $\ell_s$-SVM. To efficiently handle $\ell_s$-SVM, we devise a fast alternating direction method of multipliers with the working set ($\ell_s$-ADMM), and provide the convergence analysis. The numerical experiments on real world datasets confirm the robustness and effectiveness of the proposed method.
DCMay 27, 2025
InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and SchedulingXiaoxiao Jiang, Suyi Li, Lingyun Yang et al.
Generative image editing using diffusion models has become a prevalent application in today's AI cloud services. In production environments, image editing typically involves a mask that specifies the regions of an image template to be edited. The use of masks provides direct control over the editing process and introduces sparsity in the model inference. In this paper, we present InstGenIE, a system that efficiently serves image editing requests. The key insight behind InstGenIE is that image editing only modifies the masked regions of image templates while preserving the original content in the unmasked areas. Driven by this insight, InstGenIE judiciously skips redundant computations associated with the unmasked areas by reusing cached intermediate activations from previous inferences. To mitigate the high cache loading overhead, InstGenIE employs a bubble-free pipeline scheme that overlaps computation with cache loading. Additionally, to reduce queuing latency in online serving while improving the GPU utilization, InstGenIE proposes a novel continuous batching strategy for diffusion model serving, allowing newly arrived requests to join the running batch in just one step of denoising computation, without waiting for the entire batch to complete. As heterogeneous masks induce imbalanced loads, InstGenIE also develops a load balancing strategy that takes into account the loads of both computation and cache loading. Collectively, InstGenIE outperforms state-of-the-art diffusion serving systems for image editing, achieving up to 3x higher throughput and reducing average request latency by up to 14.7x while ensuring image quality.
IVJan 16, 2025
Domain-conditioned and Temporal-guided Diffusion Modeling for Accelerated Dynamic MRI ReconstructionLiping Zhang, Iris Yuwen Zhou, Sydney B. Montesi et al.
Purpose: To propose a domain-conditioned and temporal-guided diffusion modeling method, termed dynamic Diffusion Modeling (dDiMo), for accelerated dynamic MRI reconstruction, enabling diffusion process to characterize spatiotemporal information for time-resolved multi-coil Cartesian and non-Cartesian data. Methods: The dDiMo framework integrates temporal information from time-resolved dimensions, allowing for the concurrent capture of intra-frame spatial features and inter-frame temporal dynamics in diffusion modeling. It employs additional spatiotemporal ($x$-$t$) and self-consistent frequency-temporal ($k$-$t$) priors to guide the diffusion process. This approach ensures precise temporal alignment and enhances the recovery of fine image details. To facilitate a smooth diffusion process, the nonlinear conjugate gradient algorithm is utilized during the reverse diffusion steps. The proposed model was tested on two types of MRI data: Cartesian-acquired multi-coil cardiac MRI and Golden-Angle-Radial-acquired multi-coil free-breathing lung MRI, across various undersampling rates. Results: dDiMo achieved high-quality reconstructions at various acceleration factors, demonstrating improved temporal alignment and structural recovery compared to other competitive reconstruction methods, both qualitatively and quantitatively. This proposed diffusion framework exhibited robust performance in handling both Cartesian and non-Cartesian acquisitions, effectively reconstructing dynamic datasets in cardiac and lung MRI under different imaging conditions. Conclusion: This study introduces a novel diffusion modeling method for dynamic MRI reconstruction.
CVOct 25, 2021
Learning Continuous Face Representation with Explicit FunctionsLiping Zhang, Weijun Li, Linjun Sun et al.
How to represent a face pattern? While it is presented in a continuous way in our visual system, computers often store and process the face image in a discrete manner with 2D arrays of pixels. In this study, we attempt to learn a continuous representation for face images with explicit functions. First, we propose an explicit model (EmFace) for human face representation in the form of a finite sum of mathematical terms, where each term is an analytic function element. Further, to estimate the unknown parameters of EmFace, a novel neural network, EmNet, is designed with an encoder-decoder structure and trained using the backpropagation algorithm, where the encoder is defined by a deep convolutional neural network and the decoder is an explicit mathematical expression of EmFace. Experimental results show that EmFace has a higher representation performance on faces with various expressions, postures, and other factors, compared to that of other methods. Furthermore, EmFace achieves reasonable performance on several face image processing tasks, including face image restoration, denoising, and transformation.
IVJun 21, 2021
Context-aware PolyUNet for Liver and Lesion Segmentation from Abdominal CT ImagesLiping Zhang, Simon Chun-Ho Yu
Accurate liver and lesion segmentation from computed tomography (CT) images are highly demanded in clinical practice for assisting the diagnosis and assessment of hepatic tumor disease. However, automatic liver and lesion segmentation from contrast-enhanced CT volumes is extremely challenging due to the diversity in contrast, resolution, and quality of images. Previous methods based on UNet for 2D slice-by-slice or 3D volume-by-volume segmentation either lack sufficient spatial contexts or suffer from high GPU computational cost, which limits the performance. To tackle these issues, we propose a novel context-aware PolyUNet for accurate liver and lesion segmentation. It jointly explores structural diversity and consecutive t-adjacent slices to enrich feature expressive power and spatial contextual information while avoiding the overload of GPU memory consumption. In addition, we utilize zoom out/in and two-stage refinement strategy to exclude the irrelevant contexts and focus on the specific region for the fine-grained segmentation. Our method achieved very competitive performance at the MICCAI 2017 Liver Tumor Segmentation (LiTS) Challenge among all tasks with a single model and ranked the $3^{rd}$, $12^{th}$, $2^{nd}$, and $5^{th}$ places in the liver segmentation, lesion segmentation, lesion detection, and tumor burden estimation, respectively.
CRAug 26, 2020
An Energy Efficient Authentication Scheme using Chebyshev Chaotic Map for Smart Grid EnvironmentLiping Zhang, Yue Zhu, Wei Ren et al.
As one of the important applications of Smart grid, charging between electric vehicles has attracted much attention. However, authentication between vehicle users and an aggregator may be vulnerable to various attacks due to the usage of wireless communications. In order to reduce the computational costs yet preserve required security, the Chebyshev chaotic map based authentication schemes are proposed. However, the security requirements of Chebyshev polynomials bring a new challenge to the design of authentication schemes based on Chebyshev chaotic maps. To solve this issue, we propose a practical Chebyshev polynomial algorithm by using a binary exponentiation algorithm based on square matrix to achieve secure and efficient Chebyshev polynomial computation. We further apply the proposed algorithm to construct an energy-efficient authentication and key agreement scheme for smart grid environments. Compared with state-of-the-art schemes, the proposed authentication scheme effectively reduces the computational costs and communication costs by adopting the proposed Chebyshev polynomial algorithm. Furthermore, the ProVerif tool is employed to analyze the security of the proposed authentication scheme. Our experimental results justified that our proposed authentication scheme can outperform state-of-the-art schemes in terms of the computational overhead while achieving privacy protection.
CVAug 3, 2020
GmFace: A Mathematical Model for Face Image Representation Using Multi-GaussianLiping Zhang, Weijun Li, Lina Yu et al.
Establishing mathematical models is a ubiquitous and effective method to understand the objective world. Due to complex physiological structures and dynamic behaviors, mathematical representation of the human face is an especially challenging task. A mathematical model for face image representation called GmFace is proposed in the form of a multi-Gaussian function in this paper. The model utilizes the advantages of two-dimensional Gaussian function which provides a symmetric bell surface with a shape that can be controlled by parameters. The GmNet is then designed using Gaussian functions as neurons, with parameters that correspond to each of the parameters of GmFace in order to transform the problem of GmFace parameter solving into a network optimization problem of GmNet. The face modeling process can be described by the following steps: (1) GmNet initialization; (2) feeding GmNet with face image(s); (3) training GmNet until convergence; (4) drawing out the parameters of GmNet (as the same as GmFace); (5) recording the face model GmFace. Furthermore, using GmFace, several face image transformation operations can be realized mathematically through simple parameter computation.
CVApr 16, 2020
A Local Descriptor with Physiological Characteristic for Finger Vein RecognitionLiping Zhang, Weijun Li, Xin Ning
Local feature descriptors exhibit great superiority in finger vein recognition due to their stability and robustness against local changes in images. However, most of these are methods use general-purpose descriptors that do not consider finger vein-specific features. In this work, we propose a finger vein-specific local feature descriptors based physiological characteristic of finger vein patterns, i.e., histogram of oriented physiological Gabor responses (HOPGR), for finger vein recognition. First, prior of directional characteristic of finger vein patterns is obtained in an unsupervised manner. Then the physiological Gabor filter banks are set up based on the prior information to extract the physiological responses and orientation. Finally, to make feature has robustness against local changes in images, histogram is generated as output by dividing the image into non-overlapping cells and overlapping blocks. Extensive experimental results on several databases clearly demonstrate that the proposed method outperforms most current state-of-the-art finger vein recognition methods.
MLMar 26, 2019
Improve Diverse Text Generation by Self Labeling Conditional Variational Auto EncoderYuchi Zhang, Yongliang Wang, Liping Zhang et al.
Diversity plays a vital role in many text generating applications. In recent years, Conditional Variational Auto Encoders (CVAE) have shown promising performances for this task. However, they often encounter the so called KL-Vanishing problem. Previous works mitigated such problem by heuristic methods such as strengthening the encoder or weakening the decoder while optimizing the CVAE objective function. Nevertheless, the optimizing direction of these methods are implicit and it is hard to find an appropriate degree to which these methods should be applied. In this paper, we propose an explicit optimizing objective to complement the CVAE to directly pull away from KL-vanishing. In fact, this objective term guides the encoder towards the "best encoder" of the decoder to enhance the expressiveness. A labeling network is introduced to estimate the "best encoder". It provides a continuous label in the latent space of CVAE to help build a close connection between latent variables and targets. The whole proposed method is named Self Labeling CVAE~(SLCVAE). To accelerate the research of diverse text generation, we also propose a large native one-to-many dataset. Extensive experiments are conducted on two tasks, which show that our method largely improves the generating diversity while achieving comparable accuracy compared with state-of-art algorithms.
CVJan 13, 2019
The Liver Tumor Segmentation Benchmark (LiTS)Patrick Bilic, Patrick Christ, Hongwei Bran Li et al.
In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in \url{http://medicaldecathlon.com/}. In addition, both data and online evaluation are accessible via \url{www.lits-challenge.com}.
CVSep 3, 2018
Optical Flow Super-Resolution Based on Image Guidence Using Convolutional Neural NetworkLiping Zhang, Zongqing Lu, Qingmin Liao
The convolutional neural network model for optical flow estimation usually outputs a low-resolution(LR) optical flow field. To obtain the corresponding full image resolution,interpolation and variational approach are the most common options, which do not effectively improve the results. With the motivation of various convolutional neural network(CNN) structures succeeded in single image super-resolution(SISR) task, an end-to-end convolutional neural network is proposed to reconstruct the high resolution(HR) optical flow field from initial LR optical flow with the guidence of the first frame used in optical flow estimation. Our optical flow super-resolution(OFSR) problem differs from the general SISR problem in two main aspects. Firstly, the optical flow includes less texture information than image so that the SISR CNN structures can't be directly used in our OFSR problem. Secondly, the initial LR optical flow data contains estimation error, while the LR image data for SISR is generally a bicubic downsampled, blurred, and noisy version of HR ground truth. We evaluate the proposed approach on two different optical flow estimation mehods and show that it can not only obtain the full image resolution, but generate more accurate optical flow field (Accuracy improve 15% on FlyingChairs, 13% on MPI Sintel) with sharper edges than the estimation result of original method.