CVJul 8, 2024
Rethinking Image Skip Connections in StyleGAN2Seung Park, Yong-Goo Shin
Various models based on StyleGAN have gained significant traction in the field of image synthesis, attributed to their robust training stability and superior performances. Within the StyleGAN framework, the adoption of image skip connection is favored over the traditional residual connection. However, this preference is just based on empirical observations; there has not been any in-depth mathematical analysis on it yet. To rectify this situation, this brief aims to elucidate the mathematical meaning of the image skip connection and introduce a groundbreaking methodology, termed the image squeeze connection, which significantly improves the quality of image synthesis. Specifically, we analyze the image skip connection technique to reveal its problem and introduce the proposed method which not only effectively boosts the GAN performance but also reduces the required number of network parameters. Extensive experiments on various datasets demonstrate that the proposed method consistently enhances the performance of state-of-the-art models based on StyleGAN. We believe that our findings represent a vital advancement in the field of image synthesis, suggesting a novel direction for future research and applications.
CVOct 18, 2022
Improving GANs with a Feature Cycling GeneratorSeung Park, Yong-Goo Shin
Generative adversarial networks (GANs), built with a generator and discriminator, significantly have advanced image generation. Typically, existing papers build their generators by stacking up multiple residual blocks since it makes ease the training of generators. However, some recent papers commented on the limitation of the residual block and proposed a new architectural unit that improves the GANs performance. Following this trend, this paper presents a novel unit, called feature cycling block (FCB), which achieves impressive results in the image generation task. Specifically, the FCB has two branches: one is a memory branch and the other is an image branch. The memory branch keeps meaningful information at each stage of the generator, whereas the image branch takes some useful features from the memory branch to produce a high-quality image. To show the capability of the proposed method, we conducted extensive experiments using various datasets including CIFAR-10, CIFAR-100, FFHQ, AFHQ, and subsets of LSUN. Experimental results demonstrate the substantial superiority of our approach over the baseline without incurring any objective functions or training skills. For instance, the proposed method improves Frechet inception distance (FID) of StyleGAN2 from 4.89 to 3.72 on the FFHQ dataset and from 6.64 to 5.57 on the LSUN Bed dataset. We believe that the pioneering attempt presented in this paper could inspire the community with better-designed generator architecture and with training objectives or skills compatible with the proposed method.
CVFeb 3, 2025Code
FSPGD: Rethinking Black-box Attacks on Semantic SegmentationEun-Sol Park, MiSo Park, Seung Park et al.
Transferability, the ability of adversarial examples crafted for one model to deceive other models, is crucial for black-box attacks. Despite advancements in attack methods for semantic segmentation, transferability remains limited, reducing their effectiveness in real-world applications. To address this, we introduce the Feature Similarity Projected Gradient Descent (FSPGD) attack, a novel black-box approach that enhances both attack performance and transferability. Unlike conventional segmentation attacks that rely on output predictions for gradient calculation, FSPGD computes gradients from intermediate layer features. Specifically, our method introduces a loss function that targets local information by comparing features between clean images and adversarial examples, while also disrupting contextual information by accounting for spatial relationships between objects. Experiments on Pascal VOC 2012 and Cityscapes datasets demonstrate that FSPGD achieves superior transferability and attack performance, establishing a new state-of-the-art benchmark. Code is available at https://github.com/KU-AIVS/FSPGD.
CVJan 27, 2022
Effective Shortcut Technique for GANSeung Park, Cheol-Hwan Yoo, Yong-Goo Shin
In recent years, generative adversarial network (GAN)-based image generation techniques design their generators by stacking up multiple residual blocks. The residual block generally contains a shortcut, \ie skip connection, which effectively supports information propagation in the network. In this paper, we propose a novel shortcut method, called the gated shortcut, which not only embraces the strength point of the residual block but also further boosts the GAN performance. More specifically, based on the gating mechanism, the proposed method leads the residual block to keep (or remove) information that is relevant (or irrelevant) to the image being generated. To demonstrate that the proposed method brings significant improvements in the GAN performance, this paper provides extensive experimental results on the various standard datasets such as CIFAR-10, CIFAR-100, LSUN, and tiny-ImageNet. Quantitative evaluations show that the gated shortcut achieves the impressive GAN performance in terms of Frechet inception distance (FID) and Inception score (IS). For instance, the proposed method improves the FID and IS scores on the tiny-ImageNet dataset from 35.13 to 27.90 and 20.23 to 23.42, respectively.
CVJan 26, 2022
Image Generation with Self Pixel-wise NormalizationYoon-Jae Yeo, Min-Cheol Sagong, Seung Park et al.
Region-adaptive normalization (RAN) methods have been widely used in the generative adversarial network (GAN)-based image-to-image translation technique. However, since these approaches need a mask image to infer the pixel-wise affine transformation parameters, they cannot be applied to the general image generation models having no paired mask images. To resolve this problem, this paper presents a novel normalization method, called self pixel-wise normalization (SPN), which effectively boosts the generative performance by performing the pixel-adaptive affine transformation without the mask image. In our method, the transforming parameters are derived from a self-latent mask that divides the feature map into the foreground and background regions. The visualization of the self-latent masks shows that SPN effectively captures a single object to be generated as the foreground. Since the proposed method produces the self-latent mask without external data, it is easily applicable in the existing generative models. Extensive experiments on various datasets reveal that the proposed method significantly improves the performance of image generation technique in terms of Frechet inception distance (FID) and Inception score (IS).
CVDec 30, 2021
A Novel Generator with Auxiliary Branch for Improving GAN PerformanceSeung Park, Yong-Goo Shin
The generator in the generative adversarial network (GAN) learns image generation in a coarse-to-fine manner in which earlier layers learn the overall structure of the image and the latter ones refine the details. To propagate the coarse information well, recent works usually build their generators by stacking up multiple residual blocks. Although the residual block can produce a high-quality image as well as be trained stably, it often impedes the information flow in the network. To alleviate this problem, this brief introduces a novel generator architecture that produces the image by combining features obtained through two different branches: the main and auxiliary branches. The goal of the main branch is to produce the image by passing through the multiple residual blocks, whereas the auxiliary branch is to convey the coarse information in the earlier layer to the later one. To combine the features in the main and auxiliary branches successfully, we also propose a gated feature fusion module that controls the information flow in those branches. To prove the superiority of the proposed method, this brief provides extensive experiments using various standard datasets including CIFAR-10, CIFAR-100, LSUN, CelebA-HQ, AFHQ, and tiny-ImageNet. Furthermore, we conducted various ablation studies to demonstrate the generalization ability of the proposed method. Quantitative evaluations prove that the proposed method exhibits impressive GAN performance in terms of Inception score (IS) and Frechet inception distance (FID). For instance, the proposed method boosts the FID and IS scores on the tiny-ImageNet dataset from 35.13 to 25.00 and 20.23 to 25.57, respectively.
CVNov 30, 2021
Generative Convolution Layer for Image GenerationSeung Park, Yong-Goo Shin
This paper introduces a novel convolution method, called generative convolution (GConv), which is simple yet effective for improving the generative adversarial network (GAN) performance. Unlike the standard convolution, GConv first selects useful kernels compatible with the given latent vector, and then linearly combines the selected kernels to make latent-specific kernels. Using the latent-specific kernels, the proposed method produces the latent-specific features which encourage the generator to produce high-quality images. This approach is simple but surprisingly effective. First, the GAN performance is significantly improved with a little additional hardware cost. Second, GConv can be employed to the existing state-of-the-art generators without modifying the network architecture. To reveal the superiority of GConv, this paper provides extensive experiments using various standard datasets including CIFAR-10, CIFAR-100, LSUN-Church, CelebA, and tiny-ImageNet. Quantitative evaluations prove that GConv significantly boosts the performances of the unconditional and conditional GANs in terms of Inception score (IS) and Frechet inception distance (FID). For example, the proposed method improves both FID and IS scores on the tiny-ImageNet dataset from 35.13 to 29.76 and 20.23 to 22.64, respectively.
CVJan 19, 2021
PConv: Simple yet Effective Convolutional Layer for Generative Adversarial NetworkSeung Park, Yoon-Jae Yeo, Yong-Goo Shin
This paper presents a novel convolutional layer, called perturbed convolution (PConv), which focuses on achieving two goals simultaneously: improving the generative adversarial network (GAN) performance and alleviating the memorization problem in which the discriminator memorizes all images from a given dataset as training progresses. In PConv, perturbed features are generated by randomly disturbing an input tensor before performing the convolution operation. This approach is simple but surprisingly effective. First, to produce a similar output even with the perturbed tensor, each layer in the discriminator should learn robust features having a small local Lipschitz value. Second, since the input tensor is randomly perturbed during the training procedure like the dropout in neural networks, the memorization problem could be alleviated. To show the generalization ability of the proposed method, we conducted extensive experiments with various loss functions and datasets including CIFAR-10, CelebA, CelebA-HQ, LSUN, and tiny-ImageNet. The quantitative evaluations demonstrate that PConv effectively boosts the performance of GAN and conditional GAN in terms of Frechet inception distance (FID).
LGNov 19, 2019
Simple yet Effective Way for Improving the Performance of GANYong-Goo Shin, Yoon-Jae Yeo, Sung-Jea Ko
In adversarial learning, discriminator often fails to guide the generator successfully since it distinguishes between real and generated images using silly or non-robust features. To alleviate this problem, this brief presents a simple but effective way that improves the performance of generative adversarial network (GAN) without imposing the training overhead or modifying the network architectures of existing methods. The proposed method employs a novel cascading rejection (CR) module for discriminator, which extracts multiple non-overlapped features in an iterative manner using the vector rejection operation. Since the extracted diverse features prevent the discriminator from concentrating on non-meaningful features, the discriminator can guide the generator effectively to produce the images that are more similar to the real images. In addition, since the proposed CR module requires only a few simple vector operations, it can be readily applied to existing frameworks with marginal training overheads. Quantitative evaluations on various datasets including CIFAR-10, CelebA, CelebA-HQ, LSUN, and tiny-ImageNet confirm that the proposed method significantly improves the performance of GAN and conditional GAN in terms of Frechet inception distance (FID) indicating the diversity and visual appearance of the generated images.
CVNov 18, 2019
Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand ArticulationsCheol-hwan Yoo, Seo-won Ji, Yong-goo Shin et al.
3D hand pose estimation from a single depth image plays an important role in computer vision and human-computer interaction. Although recent hand pose estimation methods using convolution neural network (CNN) have shown notable improvements in accuracy, most of them have a limitation that they rely on a complex network structure without fully exploiting the articulated structure of the hand. A hand, which is an articulated object, is composed of six local parts: the palm and five independent fingers. Each finger consists of sequential-joints that provide constrained motion, referred to as a kinematic chain. In this paper, we propose a hierarchically-structured convolutional recurrent neural network (HCRNN) with six branches that estimate the 3D position of the palm and five fingers independently. The palm position is predicted via fully-connected layers. Each sequential-joint, i.e. finger position, is obtained using a recurrent neural network (RNN) to capture the spatial dependencies between adjacent joints. Then the output features of the palm and finger branches are concatenated to estimate the global hand position. HCRNN directly takes the depth map as an input without a time-consuming data conversion, such as 3D voxels and point clouds. Experimental results on public datasets demonstrate that the proposed HCRNN not only outperforms most 2D CNN-based methods using the depth image as their inputs but also achieves competitive results with state-of-the-art 3D CNN-based methods with a highly efficient running speed of 285 fps on a single GPU.
CVJun 3, 2019
cGANs with Conditional Convolution LayerMin-Cheol Sagong, Yong-Goo Shin, Yoon-Jae Yeo et al.
Conditional generative adversarial networks (cGANs) have been widely researched to generate class conditional images using a single generator. However, in the conventional cGANs techniques, it is still challenging for the generator to learn condition-specific features, since a standard convolutional layer with the same weights is used regardless of the condition. In this paper, we propose a novel convolution layer, called the conditional convolution layer, which directly generates different feature maps by employing the weights which are adjusted depending on the conditions. More specifically, in each conditional convolution layer, the weights are conditioned in a simple but effective way through filter-wise scaling and channel-wise shifting operations. In contrast to the conventional methods, the proposed method with a single generator can effectively handle condition-specific characteristics. The experimental results on CIFAR, LSUN and ImageNet datasets show that the generator with the proposed conditional convolution layer achieves a higher quality of conditional image generation than that with the standard convolution layer.
CVMay 22, 2019
PEPSI++: Fast and Lightweight Network for Image InpaintingYong-Goo Shin, Min-Cheol Sagong, Yoon-Jae Yeo et al.
Among the various generative adversarial network (GAN)-based image inpainting methods, a coarse-to-fine network with a contextual attention module (CAM) has shown remarkable performance. However, owing to two stacked generative networks, the coarse-to-fine network needs numerous computational resources such as convolution operations and network parameters, which result in low speed. To address this problem, we propose a novel network architecture called PEPSI: parallel extended-decoder path for semantic inpainting network, which aims at reducing the hardware costs and improving the inpainting performance. PEPSI consists of a single shared encoding network and parallel decoding networks called coarse and inpainting paths. The coarse path produces a preliminary inpainting result to train the encoding network for the prediction of features for the CAM. Simultaneously, the inpainting path generates higher inpainting quality using the refined features reconstructed via the CAM. In addition, we propose Diet-PEPSI that significantly reduces the network parameters while maintaining the performance. In Diet-PEPSI, to capture the global contextual information with low hardware costs, we propose novel rate-adaptive dilated convolutional layers, which employ the common weights but produce dynamic features depending on the given dilation rates. Extensive experiments comparing the performance with state-of-the-art image inpainting methods demonstrate that both PEPSI and Diet-PEPSI improve the qualitative scores, i.e. the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), as well as significantly reduce hardware costs such as computational time and the number of network parameters.
IVMay 15, 2019
Unsupervised Deep Contrast Enhancement with Power Constraint for OLED DisplaysYong-Goo Shin, Seung Park, Yoon-Jae Yeo et al.
Various power-constrained contrast enhancement (PCCE) techniques have been applied to an organic light emitting diode (OLED) display for reducing the power demands of the display while preserving the image quality. In this paper, we propose a new deep learning-based PCCE scheme that constrains the power consumption of the OLED displays while enhancing the contrast of the displayed image. In the proposed method, the power consumption is constrained by simply reducing the brightness a certain ratio, whereas the perceived visual quality is preserved as much as possible by enhancing the contrast of the image using a convolutional neural network (CNN). Furthermore, our CNN can learn the PCCE technique without a reference image by unsupervised learning. Experimental results show that the proposed method is superior to conventional ones in terms of image quality assessment metrics such as a visual saliency-induced index (VSI) and a measure of enhancement (EME).