CVAug 11, 2022
Self-Knowledge Distillation via DropoutHyoje Lee, Yeachan Park, Hyun Seo et al.
To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by distilling the internal knowledge of the model itself. Conventional self-knowledge distillation methods require additional trainable parameters or are dependent on the data. In this paper, we propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout). SD-Dropout distills the posterior distributions of multiple models through a dropout sampling. Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations. Furthermore, this simple method can be easily combined with various self-knowledge distillation approaches. We provide a theoretical and experimental analysis of the effect of forward and reverse KL-divergences in our work. Extensive experiments on various vision tasks, i.e., image classification, object detection, and distribution shift, demonstrate that the proposed method can effectively improve the generalization of a single network. Further experiments show that the proposed method also improves calibration performance, adversarial robustness, and out-of-distribution detection ability.
LGSep 12, 2022
Bounding the Rademacher Complexity of Fourier neural operatorsTaeyoung Kim, Myungjoo Kang
A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, Graph neural operator (GNO), and Multiwavelet-based operator (MWTO). Compared with other models, the FNO is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. In this study, we investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the model. In addition, we investigated the correlation between the empirical generalization error and the proposed capacity of FNO. From the perspective of our result, we inferred that the type of group norms determines the information about the weights and architecture of the FNO model stored in the capacity. And then, we confirmed these inferences through experiments. Based on this fact, we gained insight into the impact of the number of modes used in the FNO model on the generalization error. And we got experimental results that followed our insights.
MLNov 25, 2022
Minimal Width for Universal Property of Deep RNNChang hoon Song, Geonho Hwang, Jun ho Lee et al.
A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $\max\{d_x+1,d_y\}$, respectively, where the target function maps a finite sequence of vectors in $\mathbb{R}^{d_x}$ to a finite sequence of vectors in $\mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $\tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs.
CVOct 31, 2023Code
$p$-Poisson surface reconstruction in curl-free flow from point cloudsYesom Park, Taekyung Lee, Jooyoung Hahn et al.
The aim of this paper is the reconstruction of a smooth surface from an unorganized point cloud sampled by a closed surface, with the preservation of geometric shapes, without any further information other than the point cloud. Implicit neural representations (INRs) have recently emerged as a promising approach to surface reconstruction. However, the reconstruction quality of existing methods relies on ground truth implicit function values or surface normal vectors. In this paper, we show that proper supervision of partial differential equations and fundamental properties of differential vector fields are sufficient to robustly reconstruct high-quality surfaces. We cast the $p$-Poisson equation to learn a signed distance function (SDF) and the reconstructed surface is implicitly represented by the zero-level set of the SDF. For efficient training, we develop a variable splitting structure by introducing a gradient of the SDF as an auxiliary variable and impose the $p$-Poisson equation directly on the auxiliary variable as a hard constraint. Based on the curl-free property of the gradient field, we impose a curl-free constraint on the auxiliary variable, which leads to a more faithful reconstruction. Experiments on standard benchmark datasets show that the proposed INR provides a superior and robust reconstruction. The code is available at \url{https://github.com/Yebbi/PINC}.
LGFeb 20, 2023
Restoration based Generative ModelsJaemoo Choi, Yesom Park, Myungjoo Kang
Denoising diffusion models (DDMs) have recently attracted increasing attention by showing impressive synthesis quality. DDMs are built on a diffusion process that pushes data to the noise distribution and the models learn to denoise. In this paper, we establish the interpretation of DDMs in terms of image restoration (IR). Integrating IR literature allows us to use an alternative objective and diverse forward processes, not confining to the diffusion process. By imposing prior knowledge on the loss function grounded on MAP-based estimation, we eliminate the need for the expensive sampling of DDMs. Also, we propose a multi-scale training, which improves the performance compared to the diffusion process, by taking advantage of the flexibility of the forward process. Experimental results demonstrate that our model improves the quality and efficiency of both training and inference. Furthermore, we show the applicability of our model to inverse problems. We believe that our framework paves the way for designing a new type of flexible general generative model.
CVOct 24, 2023
Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challengeGregory Holste, Yiliang Zhou, Song Wang et al.
Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.
LGFeb 2, 2023
Learning PDE Solution Operator for Continuous Modeling of Time-SeriesYesom Park, Jaemoo Choi, Changyeon Yoon et al.
Learning underlying dynamics from data is important and challenging in many real-world scenarios. Incorporating differential equations (DEs) to design continuous networks has drawn much attention recently, however, most prior works make specific assumptions on the type of DEs, making the model specialized for particular problems. This work presents a partial differential equation (PDE) based framework which improves the dynamics modeling capability. Building upon the recent Fourier neural operator, we propose a neural operator that can handle time continuously without requiring iterative operations or specific grids of temporal discretization. A theoretical result demonstrating its universality is provided. We also uncover an intrinsic property of neural operators that improves data efficiency and model generalization by ensuring stability. Our model achieves superior accuracy in dealing with time-dependent PDEs compared to existing models. Furthermore, several numerical pieces of evidence validate that our method better represents a wide range of dynamics and outperforms state-of-the-art DE-based models in real-time-series applications. Our framework opens up a new way for a continuous representation of neural networks that can be readily adopted for real-world applications.
CVOct 11, 2022
Finding the global semantic representation in GAN through Frechet MeanJaewoong Choi, Geonho Hwang, Hyunsoo Cho et al.
The ideally disentangled latent space in GAN involves the global representation of latent space with semantic attribute coordinates. In other words, considering that this disentangled latent space is a vector space, there exists the global semantic basis where each basis component describes one attribute of generated images. In this paper, we propose an unsupervised method for finding this global semantic basis in the intermediate latent space in GANs. This semantic basis represents sample-independent meaningful perturbations that change the same semantic attribute of an image on the entire latent space. The proposed global basis, called Fréchet basis, is derived by introducing Fréchet mean to the local semantic perturbations in a latent space. Fréchet basis is discovered in two stages. First, the global semantic subspace is discovered by the Fréchet mean in the Grassmannian manifold of the local semantic subspaces. Second, Fréchet basis is found by optimizing a basis of the semantic subspace via the Fréchet mean in the Special Orthogonal Group. Experimental results demonstrate that Fréchet basis provides better semantic factorization and robustness compared to the previous methods. Moreover, we suggest the basis refinement scheme for the previous methods. The quantitative experiments show that the refined basis achieves better semantic factorization while constrained on the same semantic subspace given by the previous method.
CVMay 26, 2022
Analyzing the Latent Space of GAN through Local Dimension EstimationJaewoong Choi, Geonho Hwang, Hyunsoo Cho et al.
The impressive success of style-based GANs (StyleGANs) in high-fidelity image synthesis has motivated research to understand the semantic properties of their latent spaces. In this paper, we approach this problem through a geometric analysis of latent spaces as a manifold. In particular, we propose a local dimension estimation algorithm for arbitrary intermediate layers in a pre-trained GAN model. The estimated local dimension is interpreted as the number of possible semantic variations from this latent variable. Moreover, this intrinsic dimension estimation enables unsupervised evaluation of disentanglement for a latent space. Our proposed metric, called Distortion, measures an inconsistency of intrinsic tangent space on the learned latent space. Distortion is purely geometric and does not require any additional attribute information. Nevertheless, Distortion shows a high correlation with the global-basis-compatibility and supervised disentanglement score. Our work is the first step towards selecting the most disentangled latent space among various latent spaces in a GAN without attribute labels.
IRJul 7, 2023
AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential RecommendationJaeheyoung Jeon, Jung Hyun Ryu, Jewoong Cho et al.
This paper presents a solution to the challenges faced by contrastive learning in sequential recommendation systems. In particular, it addresses the issue of false negative, which limits the effectiveness of recommendation algorithms. By introducing an advanced approach to contrastive learning, the proposed method improves the quality of item embeddings and mitigates the problem of falsely categorizing similar instances as dissimilar. Experimental results demonstrate performance enhancements compared to existing systems. The flexibility and applicability of the proposed approach across various recommendation scenarios further highlight its value in enhancing sequential recommendation systems.
42.4AIMay 12
Inference-Only Prompt Projection for Safe Text-to-Image Generation with TV GuaranteesMinhyuk Lee, Hyekyung Yoon, Myungjoo Kang
Text-to-Image (T2I) diffusion models enable high quality open ended synthesis, but practical use requires suppressing unsafe generations while preserving behavior on benign prompts. We study this tension relative to the frozen generator, using its prompt conditioned distribution as the preservation reference. Since T2I safety is commonly evaluated by bounded risk scores on generated images, total variation (TV) bounds how much expected risk can change from this reference. We call this fixed reference constraint the Safety-Prompt Alignment Tradeoff (SPAT): reducing expected unsafety requires prompt conditioned distributional deviation. To make this deviation selective and adjustable, we define the tau safe set as prompts whose reference risk is at most tau, and cast intervention as projection toward nearby prompts in this set. We propose Selective Prompt prOjecTion (SPOT), an inference time framework that approximates this projection without retraining the generator or learning a category specific rewriter. SPOT uses an LLM to rank candidate rewrites and a safeguard VLM to accept generated images under the same tau. Across four datasets and three diffusion backbones, SPOT achieves relative inappropriate (IP) score reductions from 14.2% to 44.4% over strong safety alignment baselines while keeping benign prompt behavior close to the fixed reference.
LGOct 4, 2023
Analyzing and Improving Optimal-Transport-based Adversarial NetworksJaemoo Choi, Jaewoong Choi, Myungjoo Kang
Optimal Transport (OT) problem aims to find a transport plan that bridges two distributions while minimizing a given cost function. OT theory has been widely utilized in generative modeling. In the beginning, OT distance has been used as a measure for assessing the distance between data and generated distributions. Recently, OT transport map between data and prior distributions has been utilized as a generative model. These OT-based generative models share a similar adversarial training objective. In this paper, we begin by unifying these OT-based adversarial methods within a single framework. Then, we elucidate the role of each component in training dynamics through a comprehensive analysis of this unified framework. Moreover, we suggest a simple but novel method that improves the previously best-performing OT-based model. Intuitively, our approach conducts a gradual refinement of the generated distribution, progressively aligning it with the data distribution. Our approach achieves a FID score of 2.51 on CIFAR-10 and 5.99 on CelebA-HQ-256, outperforming unified OT-based adversarial approaches.
LGSep 30, 2024
Beyond Derivative Pathology of PINNs: Variable Splitting Strategy with Convergence AnalysisYesom Park, Changhoon Song, Myungjoo Kang
Physics-informed neural networks (PINNs) have recently emerged as effective methods for solving partial differential equations (PDEs) in various problems. Substantial research focuses on the failure modes of PINNs due to their frequent inaccuracies in predictions. However, most are based on the premise that minimizing the loss function to zero causes the network to converge to a solution of the governing PDE. In this study, we prove that PINNs encounter a fundamental issue that the premise is invalid. We also reveal that this issue stems from the inability to regulate the behavior of the derivatives of the predicted solution. Inspired by the \textit{derivative pathology} of PINNs, we propose a \textit{variable splitting} strategy that addresses this issue by parameterizing the gradient of the solution as an auxiliary variable. We demonstrate that using the auxiliary variable eludes derivative pathology by enabling direct monitoring and regulation of the gradient of the predicted solution. Moreover, we prove that the proposed method guarantees convergence to a generalized solution for second-order linear PDEs, indicating its applicability to various problems.
CVJun 10, 2024Code
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal ModelYebin Lee, Imseong Park, Myungjoo Kang
Most existing image captioning evaluation metrics focus on assigning a single numerical score to a caption by comparing it with reference captions. However, these methods do not provide an explanation for the assigned score. Moreover, reference captions are expensive to acquire. In this paper, we propose FLEUR, an explainable reference-free metric to introduce explainability into image captioning evaluation metrics. By leveraging a large multimodal model, FLEUR can evaluate the caption against the image without the need for reference captions, and provide the explanation for the assigned score. We introduce score smoothing to align as closely as possible with human judgment and to be robust to user-defined grading criteria. FLEUR achieves high correlations with human judgment across various image captioning evaluation benchmarks and reaches state-of-the-art results on Flickr8k-CF, COMPOSITE, and Pascal-50S within the domain of reference-free evaluation metrics. Our source code and results are publicly available at: https://github.com/Yebin46/FLEUR.
CVMay 24, 2023Code
Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal TransportJaemoo Choi, Jaewoong Choi, Myungjoo Kang
Optimal Transport (OT) problem investigates a transport map that bridges two distributions while minimizing a given cost function. In this regard, OT between tractable prior distribution and data has been utilized for generative modeling tasks. However, OT-based methods are susceptible to outliers and face optimization challenges during training. In this paper, we propose a novel generative model based on the semi-dual formulation of Unbalanced Optimal Transport (UOT). Unlike OT, UOT relaxes the hard constraint on distribution matching. This approach provides better robustness against outliers, stability during training, and faster convergence. We validate these properties empirically through experiments. Moreover, we study the theoretical upper-bound of divergence between distributions in UOT. Our model outperforms existing OT-based generative models, achieving FID scores of 2.97 on CIFAR-10 and 6.36 on CelebA-HQ-256. The code is available at \url{https://github.com/Jae-Moo/UOTM}.
LGMay 8, 2023Code
AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation SchemeYungi Jeong, Eunseok Yang, Jung Hyun Ryu et al.
Mechanical defects in real situations affect observation values and cause abnormalities in multivariate time series, such as sensor values or network data. To perceive abnormalities in such data, it is crucial to understand the temporal context and interrelation between variables simultaneously. The anomaly detection task for time series, especially for unlabeled data, has been a challenging problem, and we address it by applying a suitable data degradation scheme to self-supervised model training. We define four types of synthetic outliers and propose the degradation scheme in which a portion of input data is replaced with one of the synthetic outliers. Inspired by the self-attention mechanism, we design a Transformer-based architecture to recognize the temporal context and detect unnatural sequences with high efficiency. Our model converts multivariate data points into temporal representations with relative position bias and yields anomaly scores from these representations. Our method, AnomalyBERT, shows a great capability of detecting anomalies contained in complex time series and surpasses previous state-of-the-art methods on five real-world benchmarks. Our code is available at https://github.com/Jhryu30/AnomalyBERT.
LGAug 4, 2024
Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory PerspectiveTaeyoung Kim, Myungjoo Kang
The Rectified Power Unit (RePU) activation function, a differentiable generalization of the Rectified Linear Unit (ReLU), has shown promise in constructing neural networks due to its smoothness properties. However, deep RePU networks often suffer from critical issues such as vanishing or exploding values during training, rendering them unstable regardless of hyperparameter initialization. Leveraging the perspective of effective field theory, we identify the root causes of these failures and propose the Modified Rectified Power Unit (MRePU) activation function. MRePU addresses RePU's limitations while preserving its advantages, such as differentiability and universal approximation properties. Theoretical analysis demonstrates that MRePU satisfies criticality conditions necessary for stable training, placing it in a distinct universality class. Extensive experiments validate the effectiveness of MRePU, showing significant improvements in training stability and performance across various tasks, including polynomial regression, physics-informed neural networks (PINNs) and real-world vision tasks. Our findings highlight the potential of MRePU as a robust alternative for building deep neural networks.
FAAug 9, 2024
Decomposition of one-layer neural networks via the infinite sum of reproducing kernel Banach spacesSeungcheol Shin, Myungjoo Kang
In this paper, we define the sum of RKBSs using the characterization theorem of RKBSs and show that the sum of RKBSs is compatible with the direct sum of feature spaces. Moreover, we decompose the integral RKBS into the sum of $p$-norm RKBSs. Finally, we provide applications for the structural understanding of the integral RKBS class.
LGFeb 8, 2024
Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal TransportJaemoo Choi, Jaewoong Choi, Myungjoo Kang
Wasserstein Gradient Flow (WGF) describes the gradient dynamics of probability density within the Wasserstein space. WGF provides a promising approach for conducting optimization over the probability distributions. Numerically approximating the continuous WGF requires the time discretization method. The most well-known method for this is the JKO scheme. In this regard, previous WGF models employ the JKO scheme and parametrize transport map for each JKO step. However, this approach results in quadratic training complexity $O(K^2)$ with the number of JKO step $K$. This severely limits the scalability of WGF models. In this paper, we introduce a scalable WGF-based generative model, called Semi-dual JKO (S-JKO). Our model is based on the semi-dual form of the JKO step, derived from the equivalence between the JKO step and the Unbalanced Optimal Transport. Our approach reduces the training complexity to $O(K)$. We demonstrate that our model significantly outperforms existing WGF-based generative models, achieving FID scores of 2.62 on CIFAR-10 and 5.46 on CelebA-HQ-256, which are comparable to state-of-the-art image generative models.
94.4NAApr 29
Drift-Free Conservative Dynamics from Quantized Interaction RulesPark Junhu, Youngsoo Ha, Myungjoo Kang
Conservation laws are conventionally discretized through floating-point flux evaluation, with invariants obtained by cancellation of approximate interface contributions and admissible weak solutions selected by reconstruction and Riemann solvers. Here we introduce an operator-level formulation in which conservative dynamics is realized as an exact discrete interaction rule on a quantized state space. The update is defined by an antisymmetric integer-transfer operator, which enforces conservation exactly at the arithmetic level and eliminates round-off drift from the primitive evolution \cite{highamAccuracyStabilityNumerical2002}. For scalar laws, monotone order-preserving transfers select admissible shock structures within the primitive update, rather than through flux reconstruction. Numerical experiments show that the interaction rule preserves high-frequency transport near the Nyquist limit and maintains sharply localized discontinuities in Burgers dynamics. The same construction extends to multidimensional problems and systems of conservation laws through oriented, vector-valued integer transfers. These results indicate that conservative dynamics admits an exact discrete realization in which both invariance and entropy selection are encoded at the operator level, rather than arising from approximate flux cancellation.
NAJan 3, 2024
Approximating Numerical Fluxes Using Fourier Neural Operators for Hyperbolic Conservation LawsTaeyoung Kim, Myungjoo Kang
Traditionally, classical numerical schemes have been employed to solve partial differential equations (PDEs) using computational methods. Recently, neural network-based methods have emerged. Despite these advancements, neural network-based methods, such as physics-informed neural networks (PINNs) and neural operators, exhibit deficiencies in robustness and generalization. To address these issues, numerous studies have integrated classical numerical frameworks with machine learning techniques, incorporating neural networks into parts of traditional numerical methods. In this study, we focus on hyperbolic conservation laws by replacing traditional numerical fluxes with neural operators. To this end, we developed loss functions inspired by established numerical schemes related to conservation laws and approximated numerical fluxes using Fourier neural operators (FNOs). Our experiments demonstrated that our approach combines the strengths of both traditional numerical schemes and FNOs, outperforming standard FNO methods in several respects. For instance, we demonstrate that our method is robust, has resolution invariance, and is feasible as a data-driven method. In particular, our method can make continuous predictions over time and exhibits superior generalization capabilities with out-of-distribution (OOD) samples, which are challenges that existing neural operator methods encounter.
LGMay 4, 2025
CASA: CNN Autoencoder-based Score Attention for Efficient Multivariate Long-term Time-series ForecastingMinhyuk Lee, HyeKyung Yoon, MyungJoo Kang
Multivariate long-term time series forecasting is critical for applications such as weather prediction, and traffic analysis. In addition, the implementation of Transformer variants has improved prediction accuracy. Following these variants, different input data process approaches also enhanced the field, such as tokenization techniques including point-wise, channel-wise, and patch-wise tokenization. However, previous studies still have limitations in time complexity, computational resources, and cross-dimensional interactions. To address these limitations, we introduce a novel CNN Autoencoder-based Score Attention mechanism (CASA), which can be introduced in diverse Transformers model-agnosticically by reducing memory and leading to improvement in model performance. Experiments on eight real-world datasets validate that CASA decreases computational resources by up to 77.7%, accelerates inference by 44.0%, and achieves state-of-the-art performance, ranking first in 87.5% of evaluated metrics.
20.2NAApr 8
Continuum dynamics from quantised interaction rulesPark Junhu, Yongsoo Ha, Myungjoo Kang
Conservative dynamics are typically computed as floating-point approximations to continuum differential operators, which can obscure conservation through rounding and discretisation artefacts. Here we instead formulate conservative evolution directly as quantised interaction rules acting on countable states. The resulting Fast Quantised Numerical Method (FQNM) executes dynamics through antisymmetric integer transfer, with physical fields appearing only after reconstruction. In high-frequency transport, the method remains accurate deep into the Nyquist regime where a standard high-order floating-point baseline deteriorates. In nonlinear shock formation, it preserves grid-level structure and remains robust to cell drifting while maintaining exact discrete conservation. These results show that conservative dynamics can be executed directly through discrete interaction rules, with continuum behaviour emerging only as a reconstruction of underlying quantised states.
CVJul 30, 2025
MINR: Implicit Neural Representations with Masked Image ModellingSua Lee, Joonhun Lee, Myungjoo Kang
Self-supervised learning methods like masked autoencoders (MAE) have shown significant promise in learning robust feature representations, particularly in image reconstruction-based pretraining task. However, their performance is often strongly dependent on the masking strategies used during training and can degrade when applied to out-of-distribution data. To address these limitations, we introduce the masked implicit neural representations (MINR) framework that synergizes implicit neural representations with masked image modeling. MINR learns a continuous function to represent images, enabling more robust and generalizable reconstructions irrespective of masking strategies. Our experiments demonstrate that MINR not only outperforms MAE in in-domain scenarios but also in out-of-distribution settings, while reducing model complexity. The versatility of MINR extends to various self-supervised learning applications, confirming its utility as a robust and efficient alternative to existing frameworks.
CVJan 31, 2024
Spatial-and-Frequency-aware Restoration method for Images based on Diffusion ModelsKyungsung Lee, Donggyu Lee, Myungjoo Kang
Diffusion models have recently emerged as a promising framework for Image Restoration (IR), owing to their ability to produce high-quality reconstructions and their compatibility with established methods. Existing methods for solving noisy inverse problems in IR, considers the pixel-wise data-fidelity. In this paper, we propose SaFaRI, a spatial-and-frequency-aware diffusion model for IR with Gaussian noise. Our model encourages images to preserve data-fidelity in both the spatial and frequency domains, resulting in enhanced reconstruction quality. We comprehensively evaluate the performance of our model on a variety of noisy inverse problems, including inpainting, denoising, and super-resolution. Our thorough evaluation demonstrates that SaFaRI achieves state-of-the-art performance on both the ImageNet datasets and FFHQ datasets, outperforming existing zero-shot IR methods in terms of LPIPS and FID metrics.
CVOct 2, 2025
Leveraging Prior Knowledge of Diffusion Model for Person SearchGiyeol Kim, Sooyoung Yang, Jihyong Oh et al.
Person search aims to jointly perform person detection and re-identification by localizing and identifying a query person within a gallery of uncropped scene images. Existing methods predominantly utilize ImageNet pre-trained backbones, which may be suboptimal for capturing the complex spatial context and fine-grained identity cues necessary for person search. Moreover, they rely on a shared backbone feature for both person detection and re-identification, leading to suboptimal features due to conflicting optimization objectives. In this paper, we propose DiffPS (Diffusion Prior Knowledge for Person Search), a novel framework that leverages a pre-trained diffusion model while eliminating the optimization conflict between two sub-tasks. We analyze key properties of diffusion priors and propose three specialized modules: (i) Diffusion-Guided Region Proposal Network (DGRPN) for enhanced person localization, (ii) Multi-Scale Frequency Refinement Network (MSFRN) to mitigate shape bias, and (iii) Semantic-Adaptive Feature Aggregation Network (SFAN) to leverage text-aligned diffusion features. DiffPS sets a new state-of-the-art on CUHK-SYSU and PRW.
CVJul 9, 2025
Divergence-Based Similarity Function for Multi-View Contrastive LearningJae Hyoung Jeon, Cheolsu Lim, Myungjoo Kang
Recent success in contrastive learning has sparked growing interest in more effectively leveraging multiple augmented views of an instance. While prior methods incorporate multiple views at the loss or feature level, they primarily capture pairwise relationships and fail to model the joint structure across all views. In this work, we propose a divergence-based similarity function (DSF) that explicitly captures the joint structure by representing each set of augmented views as a distribution and measuring similarity as the divergence between distributions. Extensive experiments demonstrate that DSF consistently improves performance across various tasks, including kNN classification and linear evaluation, while also offering greater efficiency compared to other multi-view methods. Furthermore, we establish a theoretical connection between DSF and cosine similarity, and show that, unlike cosine similarity, DSF operates effectively without requiring a temperature hyperparameter.
MTRL-SCIJun 21, 2025
Residual Connection-Enhanced ConvLSTM for Lithium Dendrite Growth PredictionHosung Lee, Byeongoh Hwang, Dasan Kim et al.
The growth of lithium dendrites significantly impacts the performance and safety of rechargeable batteries, leading to short circuits and capacity degradation. This study proposes a Residual Connection-Enhanced ConvLSTM model to predict dendrite growth patterns with improved accuracy and computational efficiency. By integrating residual connections into ConvLSTM, the model mitigates the vanishing gradient problem, enhances feature retention across layers, and effectively captures both localized dendrite growth dynamics and macroscopic battery behavior. The dataset was generated using a phase-field model, simulating dendrite evolution under varying conditions. Experimental results show that the proposed model achieves up to 7% higher accuracy and significantly reduces mean squared error (MSE) compared to conventional ConvLSTM across different voltage conditions (0.1V, 0.3V, 0.5V). This highlights the effectiveness of residual connections in deep spatiotemporal networks for electrochemical system modeling. The proposed approach offers a robust tool for battery diagnostics, potentially aiding in real-time monitoring and optimization of lithium battery performance. Future research can extend this framework to other battery chemistries and integrate it with real-world experimental data for further validation
CVJun 2, 2025
PointT2I: LLM-based text-to-image generation via keypointsTaekyung Lee, Donggyu Lee, Myungjoo Kang
Text-to-image (T2I) generation model has made significant advancements, resulting in high-quality images aligned with an input prompt. However, despite T2I generation's ability to generate fine-grained images, it still faces challenges in accurately generating images when the input prompt contains complex concepts, especially human pose. In this paper, we propose PointT2I, a framework that effectively generates images that accurately correspond to the human pose described in the prompt by using a large language model (LLM). PointT2I consists of three components: Keypoint generation, Image generation, and Feedback system. The keypoint generation uses an LLM to directly generate keypoints corresponding to a human pose, solely based on the input prompt, without external references. Subsequently, the image generation produces images based on both the text prompt and the generated keypoints to accurately reflect the target pose. To refine the outputs of the preceding stages, we incorporate an LLM-based feedback system that assesses the semantic consistency between the generated contents and the given prompts. Our framework is the first approach to leveraging LLM for keypoints-guided image generation without any fine-tuning, producing accurate pose-aligned images based solely on textual prompts.
LGFeb 9, 2025
Neural Shortest Path for Surface Reconstruction from Point CloudsYesom Park, Imseong Park, Jooyoung Hahn et al.
In this paper, we propose the neural shortest path (NSP), a vector-valued implicit neural representation (INR) that approximates a distance function and its gradient. The key feature of NSP is to learn the exact shortest path (ESP), which directs an arbitrary point to its nearest point on the target surface. The NSP is decomposed into its magnitude and direction, and a variable splitting method is used that each decomposed component approximates a distance function and its gradient, respectively. Unlike to existing methods of learning the distance function itself, the NSP ensures the simultaneous recovery of the distance function and its gradient. We mathematically prove that the decomposed representation of NSP guarantees the convergence of the magnitude of NSP in the $H^1$ norm. Furthermore, we devise a novel loss function that enforces the property of ESP, demonstrating that its global minimum is the ESP. We evaluate the performance of the NSP through comprehensive experiments on diverse datasets, validating its capacity to reconstruct high-quality surfaces with the robustness to noise and data sparsity. The numerical results show substantial improvements over state-of-the-art methods, highlighting the importance of learning the ESP, the product of distance function and its gradient, for representing a wide variety of complex surfaces.
SDJan 2, 2025
MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical CaptionsSuhwan Choi, Kyu Won Kim, Myungjoo Kang
We introduce Multimodal Matching based on Valence and Arousal (MMVA), a tri-modal encoder framework designed to capture emotional content across images, music, and musical captions. To support this framework, we expand the Image-Music-Emotion-Matching-Net (IMEMNet) dataset, creating IMEMNet-C which includes 24,756 images and 25,944 music clips with corresponding musical captions. We employ multimodal matching scores based on the continuous valence (emotional positivity) and arousal (emotional intensity) values. This continuous matching score allows for random sampling of image-music pairs during training by computing similarity scores from the valence-arousal values across different modalities. Consequently, the proposed approach achieves state-of-the-art performance in valence-arousal prediction tasks. Furthermore, the framework demonstrates its efficacy in various zeroshot tasks, highlighting the potential of valence and arousal predictions in downstream applications.
COMP-PHApr 24, 2024
Neural Operators Learn the Local Physics of MagnetohydrodynamicsTaeyoung Kim, Youngsoo Ha, Myungjoo Kang
Magnetohydrodynamics (MHD) plays a pivotal role in describing the dynamics of plasma and conductive fluids, essential for understanding phenomena such as the structure and evolution of stars and galaxies, and in nuclear fusion for plasma motion through ideal MHD equations. Solving these hyperbolic PDEs requires sophisticated numerical methods, presenting computational challenges due to complex structures and high costs. Recent advances introduce neural operators like the Fourier Neural Operator (FNO) as surrogate models for traditional numerical analyses. This study explores a modified Flux Fourier neural operator model to approximate the numerical flux of ideal MHD, offering a novel approach that outperforms existing neural operator models by enabling continuous inference, generalization outside sampled distributions, and faster computation compared to classical numerical schemes.
IRApr 23, 2024
Revealing and Utilizing In-group Favoritism for Graph-based Collaborative FilteringHoin Jung, Hyunsoo Cho, Myungje Choi et al.
When it comes to a personalized item recommendation system, It is essential to extract users' preferences and purchasing patterns. Assuming that users in the real world form a cluster and there is common favoritism in each cluster, in this work, we introduce Co-Clustering Wrapper (CCW). We compute co-clusters of users and items with co-clustering algorithms and add CF subnetworks for each cluster to extract the in-group favoritism. Combining the features from the networks, we obtain rich and unified information about users. We experimented real world datasets considering two aspects: Finding the number of groups divided according to in-group preference, and measuring the quantity of improvement of the performance.
LGMay 24, 2023
Feature-aligned N-BEATS with Sinkhorn divergenceJoonhun Lee, Myeongho Jeon, Myungjoo Kang et al.
We propose Feature-aligned N-BEATS as a domain-generalized time series forecasting model. It is a nontrivial extension of N-BEATS with doubly residual stacking principle (Oreshkin et al. [45]) into a representation learning framework. In particular, it revolves around marginal feature probability measures induced by the intricate composition of residual and feature extracting operators of N-BEATS in each stack and aligns them stack-wise via an approximate of an optimal transport distance referred to as the Sinkhorn divergence. The training loss consists of an empirical risk minimization from multiple source domains, i.e., forecasting loss, and an alignment loss calculated with the Sinkhorn divergence, which allows the model to learn invariant features stack-wise across multiple source data sequences while retaining N-BEATS's interpretable design and forecasting power. Comprehensive experimental evaluations with ablation studies are provided and the corresponding results demonstrate the proposed model's forecasting and generalization capabilities.
LGJun 15, 2021
Robust Out-of-Distribution Detection on Deep Probabilistic Generative ModelsJaemoo Choi, Changyeon Yoon, Jeongwoo Bae et al.
Out-of-distribution (OOD) detection is an important task in machine learning systems for ensuring their reliability and safety. Deep probabilistic generative models facilitate OOD detection by estimating the likelihood of a data sample. However, such models frequently assign a suspiciously high likelihood to a specific outlier. Several recent works have addressed this issue by training a neural network with auxiliary outliers, which are generated by perturbing the input data. In this paper, we discover that these approaches fail for certain OOD datasets. Thus, we suggest a new detection metric that operates without outlier exposure. We observe that our metric is robust to diverse variations of an image compared to the previous outlier-exposing methods. Furthermore, our proposed score requires neither auxiliary models nor additional training. Instead, this paper utilizes the likelihood ratio statistic in a new perspective to extract genuine properties from the given single deep probabilistic generative model. We also apply a novel numerical approximation to enable fast implementation. Finally, we demonstrate comprehensive experiments on various probabilistic generative models and show that our method achieves state-of-the-art performance.
CVJun 13, 2021
Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANsJaewoong Choi, Junho Lee, Changyeon Yoon et al.
The discovery of the disentanglement properties of the latent space in GANs motivated a lot of research to find the semantically meaningful directions on it. In this paper, we suggest that the disentanglement property is closely related to the geometry of the latent space. In this regard, we propose an unsupervised method for finding the semantic-factorizing directions on the intermediate latent space of GANs based on the local geometry. Intuitively, our proposed method, called Local Basis, finds the principal variation of the latent space in the neighborhood of the base latent variable. Experimental results show that the local principal variation corresponds to the semantic factorization and traversing along it provides strong robustness to image traversal. Moreover, we suggest an explanation for the limited success in finding the global traversal directions in the latent space, especially W-space of StyleGAN2. We show that W-space is warped globally by comparing the local geometry, discovered from Local Basis, through the metric on Grassmannian Manifold. The global warpage implies that the latent space is not well-aligned globally and therefore the global traversal directions are bound to show limited success on it.
CVMay 25, 2021
High-Frequency aware Perceptual Image EnhancementHyungmin Roh, Myungjoo Kang
In this paper, we introduce a novel deep neural network suitable for multi-scale analysis and propose efficient model-agnostic methods that help the network extract information from high-frequency domains to reconstruct clearer images. Our model can be applied to multi-scale image enhancement problems including denoising, deblurring and single image super-resolution. Experiments on SIDD, Flickr2K, DIV2K, and REDS datasets show that our method achieves state-of-the-art performance on each task. Furthermore, we show that our model can overcome the over-smoothing problem commonly observed in existing PSNR-oriented methods and generate more natural high-resolution images by applying adversarial training.
LGJan 7, 2021
OAAE: Adversarial Autoencoders for Novelty Detection in Multi-modal Normality Case via Orthogonalized Latent SpaceSungkwon An, Jeonghoon Kim, Myungjoo Kang et al.
Novelty detection using deep generative models such as autoencoder, generative adversarial networks mostly takes image reconstruction error as novelty score function. However, image data, high dimensional as it is, contains a lot of different features other than class information which makes models hard to detect novelty data. The problem gets harder in multi-modal normality case. To address this challenge, we propose a new way of measuring novelty score in multi-modal normality cases using orthogonalized latent space. Specifically, we employ orthogonal low-rank embedding in the latent space to disentangle the features in the latent space using mutual class information. With the orthogonalized latent space, novelty score is defined by the change of each latent vector. Proposed algorithm was compared to state-of-the-art novelty detection algorithms using GAN such as RaPP and OCGAN, and experimental results show that ours outperforms those algorithms.
CVSep 29, 2020
MCW-Net: Single Image Deraining with Multi-level Connections and Wide Regional Non-local BlocksYeachan Park, Myeongho Jeon, Junho Lee et al.
A recent line of convolutional neural network-based works has succeeded in capturing rain streaks. However, difficulties in detailed recovery still remain. In this paper, we present a multi-level connection and wide regional non-local block network (MCW-Net) to properly restore the original background textures in rainy images. Unlike existing encoder-decoder-based image deraining models that improve performance with additional branches, MCW-Net improves performance by maximizing information utilization without additional branches through the following two proposed methods. The first method is a multi-level connection that repeatedly connects multi-level features of the encoder network to the decoder network. Multi-level connection encourages the decoding process to use the feature information of all levels. In multi-level connection, channel-wise attention is considered to learn which level of features is important in the decoding process of the current level. The second method is a wide regional non-local block. As rain streaks primarily exhibit a vertical distribution, we divide the grid of the image into horizontally-wide patches and apply a non-local operation to each region to explore the rich rain-free background information. Experimental results on both synthetic and real-world rainy datasets demonstrate that the proposed model significantly outperforms existing state-of-the-art models. Furthermore, the results of the joint deraining and segmentation experiment prove that our model contributes effectively to other vision tasks.
LGSep 17, 2020
Discond-VAE: Disentangling Continuous Factors from the DiscreteJaewoong Choi, Geonho Hwang, Myungjoo Kang
In the real-world data, there are common variations shared by all classes (e.g. category label) and exclusive variations of each class. We propose a variant of VAE capable of disentangling both of these variations. To represent these generative factors of data, we introduce two sets of continuous latent variables, private variable and public variable. Our proposed framework models the private variable as a Mixture of Gaussian and the public variable as a Gaussian, respectively. Each mode of the private variable is responsible for a class of the discrete variable. Most of the previous attempts to integrate the discrete generative factors to disentanglement assume statistical independence between the continuous and discrete variables. Our proposed model, which we call Discond-VAE, DISentangles the class-dependent CONtinuous factors from the Discrete factors by introducing the private variables. The experiments show that Discond-VAE can discover the private and public factors from data. Moreover, even under the dataset with only public factors, Discond-VAE does not fail and adapts the private variables to represent the public factors.
CVJul 9, 2020
Blur Invariant Kernel-Adaptive Network for Single Image Blind deblurringSungkwon An, Hyungmin Roh, Myungjoo Kang
We present a novel, blind, single image deblurring method that utilizes information regarding blur kernels. Our model solves the deblurring problem by dividing it into two successive tasks: (1) blur kernel estimation and (2) sharp image restoration. We first introduce a kernel estimation network that produces adaptive blur kernels based on the analysis of the blurred image. The network learns the blur pattern of the input image and trains to generate the estimation of image-specific blur kernels. Subsequently, we propose a deblurring network that restores sharp images using the estimated blur kernel. To use the kernel efficiently, we propose a kernel-adaptive AE block that encodes features from both blurred images and blur kernels into a low dimensional space and then decodes them simultaneously to obtain an appropriately synthesized feature representation. We evaluate our model on REDS, GOPRO and Flickr2K datasets using various Gaussian blur kernels. Experiments show that our model can achieve state-of-the-art results on each dataset.
CVMay 8, 2020
NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and ResultsAbdelrahman Abdelhamed, Mahmoud Afifi, Radu Timofte et al.
This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern rawRGB and (2) the standard RGB (sRGB) color spaces. Each track ~250 registered participants. A total of 22 teams, proposing 24 methods, competed in the final phase of the challenge. The proposed methods by the participating teams represent the current state-of-the-art performance in image denoising targeting real noisy images. The newly collected SIDD+ datasets are publicly available at: https://bit.ly/siddplus_data.
CVJan 12, 2020
Membership Inference Attacks Against Object Detection ModelsYeachan Park, Myungjoo Kang
Machine learning models can leak information regarding the dataset they have trained. In this paper, we present the first membership inference attack against black-boxed object detection models that determines whether the given data records are used in the training. To attack the object detection model, we devise a novel method named as called a canvas method, in which predicted bounding boxes are drawn on an empty image for the attack model input. Based on the experiments, we successfully reveal the membership status of privately sensitive data trained using one-stage and two-stage detection models. We then propose defense strategies and also conduct a transfer attack between the models and datasets. Our results show that object detection models are also vulnerable to inference attacks like other models.
CVJul 3, 2019
Attention routing between capsulesJaewoong Choi, Hyun Seo, Suii Im et al.
In this paper, we propose a new capsule network architecture called Attention Routing CapsuleNet (AR CapsNet). We replace the dynamic routing and squash activation function of the capsule network with dynamic routing (CapsuleNet) with the attention routing and capsule activation. The attention routing is a routing between capsules through an attention module. The attention routing is a fast forward-pass while keeping spatial information. On the other hand, the intuitive interpretation of the dynamic routing is finding a centroid of the prediction capsules. Thus, the squash activation function and its variant focus on preserving a vector orientation. However, the capsule activation focuses on performing a capsule-scale activation function. We evaluate our proposed model on the MNIST, affNIST, and CIFAR-10 classification tasks. The proposed model achieves higher accuracy with fewer parameters (x0.65 in the MNIST, x0.82 in the CIFAR-10) and less training time than CapsuleNet (x0.19 in the MNIST, x0.35 in the CIFAR-10). These results validate that designing a capsule-scale operation is a key factor to implement the capsule concept. Also, our experiment shows that our proposed model is transformation equivariant as CapsuleNet. As we perturb each element of the output capsule, the decoder attached to the output capsules shows global variations. Further experiments show that the difference in the capsule features caused by applying affine transformations on an input image is significantly aligned in one direction.
LGFeb 28, 2019
Financial series prediction using Attention LSTMSangyeon Kim, Myungjoo Kang
Financial time series prediction, especially with machine learning techniques, is an extensive field of study. In recent times, deep learning methods (especially time series analysis) have performed outstandingly for various industrial problems, with better prediction than machine learning methods. Moreover, many researchers have used deep learning methods to predict financial time series with various models in recent years. In this paper, we will compare various deep learning models, such as multilayer perceptron (MLP), one-dimensional convolutional neural networks (1D CNN), stacked long short-term memory (stacked LSTM), attention networks, and weighted attention networks for financial time series prediction. In particular, attention LSTM is not only used for prediction, but also for visualizing intermediate outputs to analyze the reason of prediction; therefore, we will show an example for understanding the model prediction intuitively with attention vectors. In addition, we focus on time and factors, which lead to an easy understanding of why certain trends are predicted when accessing a given time series table. We also modify the loss functions of the attention models with weighted categorical cross entropy; our proposed model produces a 0.76 hit ratio, which is superior to those of other methods for predicting the trends of the KOSPI 200.