Xiuyi Jia

CV
h-index10
14papers
130citations
Novelty49%
AI Score44

14 Papers

LGOct 6, 2023Code
Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering

Wei Lv, Chao Zhang, Huaxiong Li et al.

Incomplete multi-view clustering (IMVC) has received increasing attention since it is often that some views of samples are incomplete in reality. Most existing methods learn similarity subgraphs from original incomplete multi-view data and seek complete graphs by exploring the incomplete subgraphs of each view for spectral clustering. However, the graphs constructed on the original high-dimensional data may be suboptimal due to feature redundancy and noise. Besides, previous methods generally ignored the graph noise caused by the inter-class and intra-class structure variation during the transformation of incomplete graphs and complete graphs. To address these problems, we propose a novel Joint Projection Learning and Tensor Decomposition Based method (JPLTD) for IMVC. Specifically, to alleviate the influence of redundant features and noise in high-dimensional data, JPLTD introduces an orthogonal projection matrix to project the high-dimensional features into a lower-dimensional space for compact feature learning.Meanwhile, based on the lower-dimensional space, the similarity graphs corresponding to instances of different views are learned, and JPLTD stacks these graphs into a third-order low-rank tensor to explore the high-order correlations across different views. We further consider the graph noise of projected data caused by missing samples and use a tensor-decomposition based graph filter for robust clustering.JPLTD decomposes the original tensor into an intrinsic tensor and a sparse tensor. The intrinsic tensor models the true data similarities. An effective optimization algorithm is adopted to solve the JPLTD model. Comprehensive experiments on several benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art methods. The code of JPLTD is available at https://github.com/weilvNJU/JPLTD.

CVMar 28, 2023
4K-HAZE: A Dehazing Benchmark with 4K Resolution Hazy and Haze-Free Images

Zhuoran Zheng, Xiuyi Jia

Currently, mobile and IoT devices are in dire need of a series of methods to enhance 4K images with limited resource expenditure. The absence of large-scale 4K benchmark datasets hampers progress in this area, especially for dehazing. The challenges in building ultra-high-definition (UHD) dehazing datasets are the absence of estimation methods for UHD depth maps, high-quality 4K depth estimation datasets, and migration strategies for UHD haze images from synthetic to real domains. To address these problems, we develop a novel synthetic method to simulate 4K hazy images (including nighttime and daytime scenes) from clear images, which first estimates the scene depth, simulates the light rays and object reflectance, then migrates the synthetic images to real domains by using a GAN, and finally yields the hazy effects on 4K resolution images. We wrap these synthesized images into a benchmark called the 4K-HAZE dataset. Specifically, we design the CS-Mixer (an MLP-based model that integrates \textbf{C}hannel domain and \textbf{S}patial domain) to estimate the depth map of 4K clear images, the GU-Net to migrate a 4K synthetic image to the real hazy domain. The most appealing aspect of our approach (depth estimation and domain migration) is the capability to run a 4K image on a single GPU with 24G RAM in real-time (33fps). Additionally, this work presents an objective assessment of several state-of-the-art single-image dehazing methods that are evaluated using the 4K-HAZE dataset. At the end of the paper, we discuss the limitations of the 4K-HAZE dataset and its social implications.

CVApr 20, 2023
Complex Mixer for MedMNIST Classification Decathlon

Zhuoran Zheng, Xiuyi Jia

With the development of the medical image field, researchers seek to develop a class of datasets to block the need for medical knowledge, such as \text{MedMNIST} (v2). MedMNIST (v2) includes a large number of small-sized (28 $\times$ 28 or 28 $\times$ 28 $\times$ 28) medical samples and the corresponding expert annotations (class label). The existing baseline model (Google AutoML Vision, ResNet-50+3D) can reach an average accuracy of over 70\% on MedMNIST (v2) datasets, which is comparable to the performance of expert decision-making. Nevertheless, we note that there are two insurmountable obstacles to modeling on MedMNIST (v2): 1) the raw images are cropped to low scales may cause effective recognition information to be dropped and the classifier to have difficulty in tracing accurate decision boundaries; 2) the labelers' subjective insight may cause many uncertainties in the label space. To address these issues, we develop a Complex Mixer (C-Mixer) with a pre-training framework to alleviate the problem of insufficient information and uncertainty in the label space by introducing an incentive imaginary matrix and a self-supervised scheme with random masking. Our method (incentive learning and self-supervised learning with masking) shows surprising potential on both the standard MedMNIST (v2) dataset, the customized weakly supervised datasets, and other image enhancement tasks.

LGSep 28, 2022
Label Distribution Learning via Implicit Distribution Representation

Zhuoran Zheng, Xiuyi Jia

In contrast to multi-label learning, label distribution learning characterizes the polysemy of examples by a label distribution to represent richer semantics. In the learning process of label distribution, the training data is collected mainly by manual annotation or label enhancement algorithms to generate label distribution. Unfortunately, the complexity of the manual annotation task or the inaccuracy of the label enhancement algorithm leads to noise and uncertainty in the label distribution training set. To alleviate this problem, we introduce the implicit distribution in the label distribution learning framework to characterize the uncertainty of each label value. Specifically, we use deep implicit representation learning to construct a label distribution matrix with Gaussian prior constraints, where each row component corresponds to the distribution estimate of each label value, and this row component is constrained by a prior Gaussian distribution to moderate the noise and uncertainty interference of the label distribution dataset. Finally, each row component of the label distribution matrix is transformed into a standard label distribution form by using the self-attention algorithm. In addition, some approaches with regularization characteristics are conducted in the training phase to improve the performance of the model.

CVJun 8, 2022
Ultra-High-Definition Image Deblurring via Multi-scale Cubic-Mixer

Xingchi Chen, Xiuyi Jia, Zhuoran Zheng

Currently, transformer-based algorithms are making a splash in the domain of image deblurring. Their achievement depends on the self-attention mechanism with CNN stem to model long range dependencies between tokens. Unfortunately, this ear-pleasing pipeline introduces high computational complexity and makes it difficult to run an ultra-high-definition image on a single GPU in real time. To trade-off accuracy and efficiency, the input degraded image is computed cyclically over three dimensional ($C$, $W$, and $H$) signals without a self-attention mechanism. We term this deep network as Multi-scale Cubic-Mixer, which is acted on both the real and imaginary components after fast Fourier transform to estimate the Fourier coefficients and thus obtain a deblurred image. Furthermore, we combine the multi-scale cubic-mixer with a slicing strategy to generate high-quality results at a much lower computational cost. Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art deblurring approaches on the several benchmarks and a new ultra-high-definition dataset in terms of accuracy and speed.

LGOct 15, 2022
Label distribution learning via label correlation grid

Qimeng Guo, Zhuoran Zheng, Xiuyi Jia et al.

Label distribution learning can characterize the polysemy of an instance through label distributions. However, some noise and uncertainty may be introduced into the label space when processing label distribution data due to artificial or environmental factors. To alleviate this problem, we propose a \textbf{L}abel \textbf{C}orrelation \textbf{G}rid (LCG) to model the uncertainty of label relationships. Specifically, we compute a covariance matrix for the label space in the training set to represent the relationships between labels, then model the information distribution (Gaussian distribution function) for each element in the covariance matrix to obtain an LCG. Finally, our network learns the LCG to accurately estimate the label distribution for each instance. In addition, we propose a label distribution projection algorithm as a regularization term in the model training process. Extensive experiments verify the effectiveness of our method on several real benchmarks.

LGOct 25, 2022
TabMixer: Excavating Label Distribution Learning with Small-scale Features

Weiyi Cong, Zhuoran Zheng, Xiuyi Jia

Label distribution learning (LDL) differs from multi-label learning which aims at representing the polysemy of instances by transforming single-label values into descriptive degrees. Unfortunately, the feature space of the label distribution dataset is affected by human factors and the inductive bias of the feature extractor causing uncertainty in the feature space. Especially, for datasets with small-scale feature spaces (the feature space dimension $\approx$ the label space), the existing LDL algorithms do not perform well. To address this issue, we seek to model the uncertainty augmentation of the feature space to alleviate the problem in LDL tasks. Specifically, we start with augmenting each feature value in the feature vector of a sample into a vector (sampling on a Gaussian distribution function). Which, the variance parameter of the Gaussian distribution function is learned by using a sub-network, and the mean parameter is filled by this feature value. Then, each feature vector is augmented to a matrix which is fed into a mixer with local attention (\textit{TabMixer}) to extract the latent feature. Finally, the latent feature is squeezed to yield an accurate label distribution via a squeezed network. Extensive experiments verify that our proposed algorithm can be competitive compared to other LDL algorithms on several benchmarks.

CVAug 12, 2024
Boosting Adverse Weather Crowd Counting via Multi-queue Contrastive Learning

Tianhang Pan, Xiuyi Jia

Currently, most crowd counting methods have outstanding performance under normal weather conditions. However, our experimental validation reveals two key obstacles limiting the accuracy improvement of crowd counting models: 1) the domain gap between the adverse weather and the normal weather images; 2) the weather class imbalance in the training set. To address the problems, we propose a two-stage crowd counting method named Multi-queue Contrastive Learning (MQCL). Specifically, in the first stage, our target is to equip the backbone network with weather-awareness capabilities. In this process, a contrastive learning method named multi-queue MoCo designed by us is employed to enable representation learning under weather class imbalance. After the first stage is completed, the backbone model is "mature" enough to extract weather-related representations. On this basis, we proceed to the second stage, in which we propose to refine the representations under the guidance of contrastive learning, enabling the conversion of the weather-aware representations to the normal weather domain. Through such representation and conversion, the model achieves robust counting performance under both normal and adverse weather conditions. Extensive experimental results show that, compared to the baseline, MQCL reduces the counting error under adverse weather conditions by 22%, while introducing only about 13% increase in computational burden, which achieves state-of-the-art performance.

CVFeb 3, 2024Code
Polyp-DAM: Polyp segmentation via depth anything model

Zhuoran Zheng, Chen Wu, Wei Wang et al.

Recently, large models (Segment Anything model) came on the scene to provide a new baseline for polyp segmentation tasks. This demonstrates that large models with a sufficient image level prior can achieve promising performance on a given task. In this paper, we unfold a new perspective on polyp segmentation modeling by leveraging the Depth Anything Model (DAM) to provide depth prior to polyp segmentation models. Specifically, the input polyp image is first passed through a frozen DAM to generate a depth map. The depth map and the input polyp images are then concatenated and fed into a convolutional neural network with multiscale to generate segmented images. Extensive experimental results demonstrate the effectiveness of our method, and in addition, we observe that our method still performs well on images of polyps with noise. The URL of our code is \url{https://github.com/zzr-idam/Polyp-DAM}.

31.0CVMar 11
UHD Image Deblurring via Autoregressive Flow with Ill-conditioned Constraints

Yucheng Xin, Dawei Zhao, Xiang Chen et al.

Ultra-high-definition (UHD) image deblurring poses significant challenges for UHD restoration methods, which must balance fine-grained detail recovery and practical inference efficiency. Although prominent discriminative and generative methods have achieved remarkable results, a trade-off persists between computational cost and the ability to generate fine-grained detail for UHD image deblurring tasks. To further alleviate these issues, we propose a novel autoregressive flow method for UHD image deblurring with an ill-conditioned constraint. Our core idea is to decompose UHD restoration into a progressive, coarse-to-fine process: at each scale, the sharp estimate is formed by upsampling the previous-scale result and adding a current-scale residual, enabling stable, stage-wise refinement from low to high resolution. We further introduce Flow Matching to model residual generation as a conditional vector field and perform few-step ODE sampling with efficient Euler/Heun solvers, enriching details while keeping inference affordable. Since multi-step generation at UHD can be numerically unstable, we propose an ill-conditioning suppression scheme by imposing condition-number regularization on a feature-induced attention matrix, improving convergence and cross-scale consistency. Our method demonstrates promising performance on blurred images at 4K (3840$\times$2160) or higher resolutions.

CVAug 23, 2025Code
Local Information Matters: A Rethink of Crowd Counting

Tianhang Pan, Xiuyi Jia

The motivation of this paper originates from rethinking an essential characteristic of crowd counting: individuals (heads of humans) in the crowd counting task typically occupy a very small portion of the image. This characteristic has never been the focus of existing works: they typically use the same backbone as other visual tasks and pursue a large receptive field. This drives us to propose a new model design principle of crowd counting: emphasizing local modeling capability of the model. We follow the principle and design a crowd counting model named Local Information Matters Model (LIMM). The main innovation lies in two strategies: a window partitioning design that applies grid windows to the model input, and a window-wise contrastive learning design to enhance the model's ability to distinguish between local density levels. Moreover, a global attention module is applied to the end of the model to handle the occasionally occurring large-sized individuals. Extensive experiments on multiple public datasets illustrate that the proposed model shows a significant improvement in local modeling capability (8.7\% in MAE on the JHU-Crowd++ high-density subset for example), without compromising its ability to count large-sized ones, which achieves state-of-the-art performance. Code is available at: https://github.com/tianhangpan/LIMM.

CVApr 21, 2025
Distribution-aware Dataset Distillation for Efficient Image Restoration

Zhuoran Zheng, Xin Su, Chen Wu et al.

With the exponential increase in image data, training an image restoration model is laborious. Dataset distillation is a potential solution to this problem, yet current distillation techniques are a blank canvas in the field of image restoration. To fill this gap, we propose the Distribution-aware Dataset Distillation method (TripleD), a new framework that extends the principles of dataset distillation to image restoration. Specifically, TripleD uses a pre-trained vision Transformer to extract features from images for complexity evaluation, and the subset (the number of samples is much smaller than the original training set) is selected based on complexity. The selected subset is then fed through a lightweight CNN that fine-tunes the image distribution to align with the distribution of the original dataset at the feature level. To efficiently condense knowledge, the training is divided into two stages. Early stages focus on simpler, low-complexity samples to build foundational knowledge, while later stages select more complex and uncertain samples as the model matures. Our method achieves promising performance on multiple image restoration tasks, including multi-task image restoration, all-in-one image restoration, and ultra-high-definition image restoration tasks. Note that we can train a state-of-the-art image restoration model on an ultra-high-definition (4K resolution) dataset using only one consumer-grade GPU in less than 8 hours (500 savings in computing resources and immeasurable training time).

CVFeb 13, 2022
Zero-Reference Image Restoration for Under-Display Camera of UAV

Zhuoran Zheng, Xiuyi Jia, Yunliang Zhuang

The exposed cameras of UAV can shake, shift, or even malfunction under the influence of harsh weather, while the add-on devices (Dupont lines) are very vulnerable to damage. We can place a low-cost T-OLED overlay around the camera to protect it, but this would also introduce image degradation issues. In particular, the temperature variations in the atmosphere can create mist that adsorbs to the T-OLED, which can cause secondary disasters (i.e., more severe image degradation) during the UAV's filming process. To solve the image degradation problem caused by overlaying T-OLEDs, in this paper we propose a new method to enhance the visual experience by enhancing the texture and color of images. Specifically, our method trains a lightweight network to estimate a low-rank affine grid on the input image, and then utilizes the grid to enhance the input image at block granularity. The advantages of our method are that no reference image is required and the loss function is developed from visual experience. In addition, our model can perform high-quality recovery of images of arbitrary resolution in real time. In the end, the limitations of our model and the collected datasets (including the daytime and nighttime scenes) are discussed.

LGJun 19, 2019
Constrained Bilinear Factorization Multi-view Subspace Clustering

Qinghai Zheng, Jihua Zhu, Zhiqiang Tian et al.

Multi-view clustering is an important and fundamental problem. Many multi-view subspace clustering methods have been proposed, and most of them assume that all views share a same coefficient matrix. However, the underlying information of multi-view data are not fully exploited under this assumption, since the coefficient matrices of different views should have the same clustering properties rather than be uniform among multiple views. To this end, this paper proposes a novel Constrained Bilinear Factorization Multi-view Subspace Clustering (CBF-MSC) method. Specifically, the bilinear factorization with an orthonormality constraint and a low-rank constraint is imposed for all coefficient matrices to make them have the same trace-norm instead of being equivalent, so as to explore the consensus information of multi-view data more fully. Finally, an Augmented Lagrangian Multiplier (ALM) based algorithm is designed to optimize the objective function. Comprehensive experiments tested on nine benchmark datasets validate the effectiveness and competitiveness of the proposed approach compared with several state-of-the-arts.