Guanfang Dong

CV
12papers
41citations
Novelty48%
AI Score38

12 Papers

LGMar 16, 2022Code
Example Perplexity

Nevin L. Zhang, Weiyan Xie, Zhi Lin et al.

Some examples are easier for humans to classify than others. The same should be true for deep neural networks (DNNs). We use the term example perplexity to refer to the level of difficulty of classifying an example. In this paper, we propose a method to measure the perplexity of an example and investigate what factors contribute to high example perplexity. The related codes and resources are available at https://github.com/vaynexie/Example-Perplexity.

CVApr 19, 2023Code
Learning Temporal Distribution and Spatial Correlation Towards Universal Moving Object Segmentation

Guanfang Dong, Chenqiu Zhao, Xichen Pan et al.

The goal of moving object segmentation is separating moving objects from stationary backgrounds in videos. One major challenge in this problem is how to develop a universal model for videos from various natural scenes since previous methods are often effective only in specific scenes. In this paper, we propose a method called Learning Temporal Distribution and Spatial Correlation (LTS) that has the potential to be a general solution for universal moving object segmentation. In the proposed approach, the distribution from temporal pixels is first learned by our Defect Iterative Distribution Learning (DIDL) network for a scene-independent segmentation. Notably, the DIDL network incorporates the use of an improved product distribution layer that we have newly derived. Then, the Stochastic Bayesian Refinement (SBR) Network, which learns the spatial correlation, is proposed to improve the binary mask generated by the DIDL network. Benefiting from the scene independence of the temporal distribution and the accuracy improvement resulting from the spatial correlation, the proposed approach performs well for almost all videos from diverse and complex natural scenes with fixed parameters. Comprehensive experiments on standard datasets including LASIESTA, CDNet2014, BMC, SBMI2015 and 128 real world videos demonstrate the superiority of proposed approach compared to state-of-the-art methods with or without the use of deep learning networks. To the best of our knowledge, this work has high potential to be a general solution for moving object segmentation in real world environments. The code and real-world videos can be found on GitHub https://github.com/guanfangdong/LTS-UniverisalMOS.

CVJun 4, 2022
SPGNet: Spatial Projection Guided 3D Human Pose Estimation in Low Dimensional Space

Zihan Wang, Ruimin Chen, Mengxuan Liu et al.

We propose a method SPGNet for 3D human pose estimation that mixes multi-dimensional re-projection into supervised learning. In this method, the 2D-to-3D-lifting network predicts the global position and coordinates of the 3D human pose. Then, we re-project the estimated 3D pose back to the 2D key points along with spatial adjustments. The loss functions compare the estimated 3D pose with the 3D pose ground truth, and re-projected 2D pose with the input 2D pose. In addition, we propose a kinematic constraint to restrict the predicted target with constant human bone length. Based on the estimation results for the dataset Human3.6M, our approach outperforms many state-of-the-art methods both qualitatively and quantitatively.

LGAug 11, 2023
Learning Distributions via Monte-Carlo Marginalization

Chenqiu Zhao, Guanfang Dong, Anup Basu

We propose a novel method to learn intractable distributions from their samples. The main idea is to use a parametric distribution model, such as a Gaussian Mixture Model (GMM), to approximate intractable distributions by minimizing the KL-divergence. Based on this idea, there are two challenges that need to be addressed. First, the computational complexity of KL-divergence is unacceptable when the dimensions of distributions increases. The Monte-Carlo Marginalization (MCMarg) is proposed to address this issue. The second challenge is the differentiability of the optimization process, since the target distribution is intractable. We handle this problem by using Kernel Density Estimation (KDE). The proposed approach is a powerful tool to learn complex distributions and the entire process is differentiable. Thus, it can be a better substitute of the variational inference in variational auto-encoders (VAE). One strong evidence of the benefit of our method is that the distributions learned by the proposed approach can generate better images even based on a pre-trained VAE's decoder. Based on this point, we devise a distribution learning auto-encoder which is better than VAE under the same network architecture. Experiments on standard dataset and synthetic data demonstrate the efficiency of the proposed approach.

CVApr 17, 2023
Frequency Regularization: Restricting Information Redundancy of Convolutional Neural Networks

Chenqiu Zhao, Guanfang Dong, Shupei Zhang et al.

Convolutional neural networks have demonstrated impressive results in many computer vision tasks. However, the increasing size of these networks raises concerns about the information overload resulting from the large number of network parameters. In this paper, we propose Frequency Regularization to restrict the non-zero elements of the network parameters in the frequency domain. The proposed approach operates at the tensor level, and can be applied to almost all network architectures. Specifically, the tensors of parameters are maintained in the frequency domain, where high frequency components can be eliminated by zigzag setting tensor elements to zero. Then, the inverse discrete cosine transform (IDCT) is used to reconstruct the spatial tensors for matrix operations during network training. Since high frequency components of images are known to be less critical, a large proportion of these parameters can be set to zero when networks are trained with the proposed frequency regularization. Comprehensive evaluations on various state-of-the-art network architectures, including LeNet, Alexnet, VGG, Resnet, ViT, UNet, GAN, and VAE, demonstrate the effectiveness of the proposed frequency regularization. For a very small accuracy decrease (less than 2\%), a LeNet5 with 0.4M parameters can be represented by only 776 float16 numbers (over 1100$\times$ reduction), and a UNet with 34M parameters can be represented by only 759 float16 numbers (over 80000$\times$ reduction). In particular, the original size of the UNet model is 366MB, we reduce it to 4.5kb.

IVOct 31, 2023
Medical Image Denosing via Explainable AI Feature Preserving Loss

Guanfang Dong, Anup Basu

Denoising algorithms play a crucial role in medical image processing and analysis. However, classical denoising algorithms often ignore explanatory and critical medical features preservation, which may lead to misdiagnosis and legal liabilities. In this work, we propose a new denoising method for medical images that not only efficiently removes various types of noise, but also preserves key medical features throughout the process. To achieve this goal, we utilize a gradient-based eXplainable Artificial Intelligence (XAI) approach to design a feature preserving loss function. Our feature preserving loss function is motivated by the characteristic that gradient-based XAI is sensitive to noise. Through backpropagation, medical image features before and after denoising can be kept consistent. We conducted extensive experiments on three available medical image datasets, including synthesized 13 different types of noise and artifacts. The experimental results demonstrate the superiority of our method in terms of denoising performance, model explainability, and generalization.

LGAug 29, 2023
Bridging Distribution Learning and Image Clustering in High-dimensional Space

Guanfang Dong, Chenqiu Zhao, Anup Basu

Distribution learning focuses on learning the probability density function from a set of data samples. In contrast, clustering aims to group similar objects together in an unsupervised manner. Usually, these two tasks are considered unrelated. However, the relationship between the two may be indirectly correlated, with Gaussian Mixture Models (GMM) acting as a bridge. In this paper, we focus on exploring the correlation between distribution learning and clustering, with the motivation to fill the gap between these two fields, utilizing an autoencoder (AE) to encode images into a high-dimensional latent space. Then, Monte-Carlo Marginalization (MCMarg) and Kullback-Leibler (KL) divergence loss are used to fit the Gaussian components of the GMM and learn the data distribution. Finally, image clustering is achieved through each Gaussian component of GMM. Yet, the "curse of dimensionality" poses severe challenges for most clustering algorithms. Compared with the classic Expectation-Maximization (EM) Algorithm, experimental results show that MCMarg and KL divergence can greatly alleviate the difficulty. Based on the experimental results, we believe distribution learning can exploit the potential of GMM in image clustering within high-dimensional space.

LGAug 6, 2024
Deep Clustering via Distribution Learning

Guanfang Dong, Zijie Tan, Chenqiu Zhao et al.

Distribution learning finds probability density functions from a set of data samples, whereas clustering aims to group similar data points to form clusters. Although there are deep clustering methods that employ distribution learning methods, past work still lacks theoretical analysis regarding the relationship between clustering and distribution learning. Thus, in this work, we provide a theoretical analysis to guide the optimization of clustering via distribution learning. To achieve better results, we embed deep clustering guided by a theoretical analysis. Furthermore, the distribution learning method cannot always be directly applied to data. To overcome this issue, we introduce a clustering-oriented distribution learning method called Monte-Carlo Marginalization for Clustering. We integrate Monte-Carlo Marginalization for Clustering into Deep Clustering, resulting in Deep Clustering via Distribution Learning (DCDL). Eventually, the proposed DCDL achieves promising results compared to state-of-the-art methods on popular datasets. Considering a clustering task, the new distribution learning method outperforms previous methods as well.

CVAug 25, 2023
Is Deep Learning Network Necessary for Image Generation?

Chenqiu Zhao, Guanfang Dong, Anup Basu

Recently, images are considered samples from a high-dimensional distribution, and deep learning has become almost synonymous with image generation. However, is a deep learning network truly necessary for image generation? In this paper, we investigate the possibility of image generation without using a deep learning network, motivated by validating the assumption that images follow a high-dimensional distribution. Since images are assumed to be samples from such a distribution, we utilize the Gaussian Mixture Model (GMM) to describe it. In particular, we employ a recent distribution learning technique named as Monte-Carlo Marginalization to capture the parameters of the GMM based on image samples. Moreover, we also use the Singular Value Decomposition (SVD) for dimensionality reduction to decrease computational complexity. During our evaluation experiment, we first attempt to model the distribution of image samples directly to verify the assumption that images truly follow a distribution. We then use the SVD for dimensionality reduction. The principal components, rather than raw image data, are used for distribution learning. Compared to methods relying on deep learning networks, our approach is more explainable, and its performance is promising. Experiments show that our images have a lower FID value compared to those generated by variational auto-encoders, demonstrating the feasibility of image generation without deep learning networks.

CVDec 12, 2025
RePack then Refine: Efficient Diffusion Transformer with Vision Foundation Model

Guanfang Dong, Luke Schultz, Negar Hassanpour et al.

Semantic-rich features from Vision Foundation Models (VFMs) have been leveraged to enhance Latent Diffusion Models (LDMs). However, raw VFM features are typically high-dimensional and redundant, increasing the difficulty of learning and reducing training efficiency for Diffusion Transformers (DiTs). In this paper, we propose Repack then Refine, a three-stage framework that brings the semantic-rich VFM features to DiT while further accelerating learning efficiency. Specifically, the RePack module projects the high-dimensional features onto a compact, low-dimensional manifold. This filters out the redundancy while preserving essential structural information. A standard DiT is then trained for generative modeling on this highly compressed latent space. Finally, to restore the high-frequency details lost due to the compression in RePack, we propose a Latent-Guided Refiner, which is trained lastly for enhancing the image details. On ImageNet-1K, RePack-DiT-XL/1 achieves an FID of 1.82 in only 64 training epochs. With the Refiner module, performance further improves to an FID of 1.65, significantly surpassing latest LDMs in terms of convergence efficiency. Our results demonstrate that packing VFM features, followed by targeted refinement, is a highly effective strategy for balancing generative fidelity with training efficiency.

CVSep 1, 2023
Affine-Transformation-Invariant Image Classification by Differentiable Arithmetic Distribution Module

Zijie Tan, Guanfang Dong, Chenqiu Zhao et al.

Although Convolutional Neural Networks (CNNs) have achieved promising results in image classification, they still are vulnerable to affine transformations including rotation, translation, flip and shuffle. The drawback motivates us to design a module which can alleviate the impact from different affine transformations. Thus, in this work, we introduce a more robust substitute by incorporating distribution learning techniques, focusing particularly on learning the spatial distribution information of pixels in images. To rectify the issue of non-differentiability of prior distribution learning methods that rely on traditional histograms, we adopt the Kernel Density Estimation (KDE) to formulate differentiable histograms. On this foundation, we present a novel Differentiable Arithmetic Distribution Module (DADM), which is designed to extract the intrinsic probability distributions from images. The proposed approach is able to enhance the model's robustness to affine transformations without sacrificing its feature extraction capabilities, thus bridging the gap between traditional CNNs and distribution-based learning. We validate the effectiveness of the proposed approach through ablation study and comparative experiments with LeNet.

CVDec 21, 2021
Real-time Street Human Motion Capture

Yanquan Chen, Fei Yang, Tianyu Lang et al.

In recent years, motion capture technology using computers has developed rapidly. Because of its high efficiency and excellent performance, it replaces many traditional methods and is being widely used in many fields. Our project is about street scene video human motion capturing and analysis. The primary goal of the project is to capture the human motion in a video and use the motion information for 3D animation (human) in real-time. We applied a neural network for motion capture and implement it in the unity under a street view scene. By analyzing the motion data, we will have a better estimation of the street condition, which is useful for other high-tech applications such as self-driving cars.