CVSep 15, 2023
Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer ModelsFeihong He, Gang Li, Lingyu Si et al.
Image cartoonization has attracted significant interest in the field of image generation. However, most of the existing image cartoonization techniques require re-training models using images of cartoon style. In this paper, we present CartoonDiff, a novel training-free sampling approach which generates image cartoonization using diffusion transformer models. Specifically, we decompose the reverse process of diffusion models into the semantic generation phase and the detail generation phase. Furthermore, we implement the image cartoonization process by normalizing high-frequency signal of the noisy image in specific denoising steps. CartoonDiff doesn't require any additional reference images, complex model designs, or the tedious adjustment of multiple parameters. Extensive experimental results show the powerful ability of our CartoonDiff. The project page is available at: https://cartoondiff.github.io/
CVJun 28, 2023
A Dimensional Structure based Knowledge Distillation Method for Cross-Modal LearningLingyu Si, Hongwei Dong, Wenwen Qiang et al.
Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks. On this basis, we express DS using deep channel-wise correlation and intermediate spatial distribution, and propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance. The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy. This is especially useful in specific applications where the performance gap between dual modalities is relatively large. Furthermore, we collect a real-world CML dataset to promote community development. The dataset contains more than 10,000 paired optical and radar images and is continuously being updated. Experimental results on real-world and benchmark datasets validate the effectiveness of the proposed method.
CVAug 30, 2023
Background Debiased SAR Target Recognition via Causal Interventional RegularizerHongwei Dong, Fangzhou Han, Lingyu Si et al.
Recent studies have utilized deep learning (DL) techniques to automatically extract features from synthetic aperture radar (SAR) images, which shows great promise for enhancing the performance of SAR automatic target recognition (ATR). However, our research reveals a previously overlooked issue: SAR images to be recognized include not only the foreground (i.e., the target), but also a certain size of the background area. When a DL-model is trained exclusively on foreground data, its recognition performance is significantly superior to a model trained on original data that includes both foreground and background. This suggests that the presence of background impedes the ability of the DL-model to learn additional semantic information about the target. To address this issue, we construct a structural causal model (SCM) that incorporates the background as a confounder. Based on the constructed SCM, we propose a causal intervention based regularization method to eliminate the negative impact of background on feature semantic learning and achieve background debiased SAR-ATR. The proposed causal interventional regularizer can be integrated into any existing DL-based SAR-ATR models to mitigate the impact of background interference on the feature extraction and recognition accuracy. Experimental results on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset indicate that the proposed method can enhance the efficiency of existing DL-based methods in a plug-and-play manner.
CVMay 9, 2024Code
Exploring Text-Guided Single Image Editing for Remote Sensing ImagesFangzhou Han, Lingyu Si, Zhizhuo Jiang et al.
Artificial intelligence generative content (AIGC) has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning based editing methods generally involve two sequential stages: generation and editing. For natural images, these stages primarily rely on generative backbones pre-trained on large-scale benchmark datasets and text guidance facilitated by vision-language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect semantics. To solve above problems, this paper proposes a text-guided RSI editing method and can be trained using only a single image. A multi-scale training approach is adopted to preserve consistency without the need for training on extensive benchmarks, while leveraging RSI pre-trained VLMs and prompt ensembling (PE) to ensure accuracy and controllability. Experimental results on multiple RSI editing tasks show that the proposed method offers significant advantages in both CLIP scores and subjective evaluations compared to existing methods. Additionally, we explore the ability of the edited RSIs to support disaster assessment tasks in order to validate their practicality. Codes will be released at https://github.com/HIT-PhilipHan/remote_sensing_image_editing.
CVJun 27, 2020
Unsupervised Deep Representation Learning and Few-Shot Classification of PolSAR ImagesLamei Zhang, Siyu Zhang, Bin Zou et al.
Deep learning and convolutional neural networks (CNNs) have made progress in polarimetric synthetic aperture radar (PolSAR) image classification over the past few years. However, a crucial issue has not been addressed, i.e., the requirement of CNNs for abundant labeled samples versus the insufficient human annotations of PolSAR images. It is well-known that following the supervised learning paradigm may lead to the overfitting of training data, and the lack of supervision information of PolSAR images undoubtedly aggravates this problem, which greatly affects the generalization performance of CNN-based classifiers in large-scale applications. To handle this problem, in this paper, learning transferrable representations from unlabeled PolSAR data through convolutional architectures is explored for the first time. Specifically, a PolSAR-tailored contrastive learning network (PCLNet) is proposed for unsupervised deep PolSAR representation learning and few-shot classification. Different from the utilization of optical processing methods, a diversity stimulation mechanism is constructed to narrow the application gap between optics and PolSAR. Beyond the conventional supervised methods, PCLNet develops an unsupervised pre-training phase based on the proxy objective of instance discrimination to learn useful representations from unlabeled PolSAR data. The acquired representations are transferred to the downstream task, i.e., few-shot PolSAR classification. Experiments on two widely-used PolSAR benchmark datasets confirm the validity of PCLNet. Besides, this work may enlighten how to efficiently utilize the massive unlabeled PolSAR data to alleviate the greedy demands of CNN-based methods for human annotations.
CVNov 16, 2019
Automatic Design of CNNs via Differentiable Neural Architecture Search for PolSAR Image ClassificationHongwei Dong, Siyu Zhang, Bin Zou et al.
Convolutional neural networks (CNNs) have shown good performance in polarimetric synthetic aperture radar (PolSAR) image classification due to the automation of feature engineering. Excellent hand-crafted architectures of CNNs incorporated the wisdom of human experts, which is an important reason for CNN's success. However, the design of the architectures is a difficult problem, which needs a lot of professional knowledge as well as computational resources. Moreover, the architecture designed by hand might be suboptimal, because it is only one of thousands of unobserved but objective existed paths. Considering that the success of deep learning is largely due to its automation of the feature engineering process, how to design automatic architecture searching methods to replace the hand-crafted ones is an interesting topic. In this paper, we explore the application of neural architecture search (NAS) in PolSAR area for the first time. Different from the utilization of existing NAS methods, we propose a differentiable architecture search (DAS) method which is customized for PolSAR classification. The proposed DAS is equipped with a PolSAR tailored search space and an improved one-shot search strategy. By DAS, the weights parameters and architecture parameters (corresponds to the hyperparameters but not the topologies) can be optimized by stochastic gradient descent method during the training. The optimized architecture parameters should be transformed into corresponding CNN architecture and re-train to achieve high-precision PolSAR classification. In addition, complex-valued DAS is developed to take into account the characteristics of PolSAR images so as to further improve the performance. Experiments on three PolSAR benchmark datasets show that the CNNs obtained by searching have better classification performance than the hand-crafted ones.
CVJun 11, 2019
Band Attention Convolutional Networks For Hyperspectral Image ClassificationHongwei Dong, Lamei Zhang, Bin Zou
Redundancy and noise exist in the bands of hyperspectral images (HSIs). Thus, it is a good property to be able to select suitable parts from hundreds of input bands for HSIs classification methods. In this letter, a band attention module (BAM) is proposed to implement the deep learning based HSIs classification with the capacity of band selection or weighting. The proposed BAM can be seen as a plug-and-play complementary component of the existing classification networks which fully considers the adverse effects caused by the redundancy of the bands when using convolutional neural networks (CNNs) for HSIs classification. Unlike most of deep learning methods used in HSIs, the band attention module which is customized according to the characteristics of hyperspectral images is embedded in the ordinary CNNs for better performance. At the same time, unlike classical band selection or weighting methods, the proposed method achieves the end-to-end training instead of the separated stages. Experiments are carried out on two HSI benchmark datasets. Compared to some classical and advanced deep learning methods, numerical simulations under different evaluation criteria show that the proposed method have good performance. Last but not least, some advanced CNNs are combined with the proposed BAM for better performance.
LGMar 27, 2019
Kernel based regression with robust loss function via iteratively reweighted least squaresHongwei Dong, Liming Yang
Least squares kernel based methods have been widely used in regression problems due to the simple implementation and good generalization performance. Among them, least squares support vector regression (LS-SVR) and extreme learning machine (ELM) are popular techniques. However, the noise sensitivity is a major bottleneck. To address this issue, a generalized loss function, called $\ell_s$-loss, is proposed in this paper. With the support of novel loss function, two kernel based regressors are constructed by replacing the $\ell_2$-loss in LS-SVR and ELM with the proposed $\ell_s$-loss for better noise robustness. Important properties of $\ell_s$-loss, including robustness, asymmetry and asymptotic approximation behaviors, are verified theoretically. Moreover, iteratively reweighted least squares (IRLS) is utilized to optimize and interpret the proposed methods from a weighted viewpoint. The convergence of the proposal are proved, and detailed analyses of robustness are given. Experiments on both artificial and benchmark datasets confirm the validity of the proposed methods.
CVMar 24, 2019
Efficiently utilizing complex-valued PolSAR image data via a multi-task deep learning frameworkLamei Zhang, Hongwei Dong, Bin Zou
Convolutional neural networks (CNNs) have been widely used to improve the accuracy of polarimetric synthetic aperture radar (PolSAR) image classification. However, in most studies, the difference between PolSAR images and optical images is rarely considered. Most of the existing CNNs are not tailored for the task of PolSAR image classification, in which complex-valued PolSAR data have been simply equated to real-valued data to fit the optical image processing architectures and avoid complex-valued operations. This is one of the reasons CNNs unable to perform their full capabilities in PolSAR classification. To solve the above problem, the objective of this paper is to develop a tailored CNN framework for PolSAR image classification, which can be implemented from two aspects: Seeking a better form of PolSAR data as the input of CNNs and building matched CNN architectures based on the proposed input form. In this paper, considering the properties of complex-valued numbers, amplitude and phase of complex-valued PolSAR data are extracted as the input for the first time to maintain the integrity of original information while avoiding immature complex-valued operations. Then, a multi-task CNN (MCNN) architecture is proposed to match the improved input form and achieve better classification results. Furthermore, depthwise separable convolution is introduced to the proposed architecture in order to better extract information from the phase information. Experiments on three PolSAR benchmark datasets not only prove that using amplitude and phase as the input do contribute to the improvement of PolSAR classification, but also verify the adaptability between the improved input form and the well-designed architectures.