CVSep 30, 2024Code
Illustrious: an Open Advanced Illustration ModelSang Hyun Park, Jun Young Koh, Junha Lee et al.
In this work, we share the insights for achieving state-of-the-art quality in our text-to-image anime image generative model, called Illustrious. To achieve high resolution, dynamic color range images, and high restoration ability, we focus on three critical approaches for model improvement. First, we delve into the significance of the batch size and dropout control, which enables faster learning of controllable token based concept activations. Second, we increase the training resolution of images, affecting the accurate depiction of character anatomy in much higher resolution, extending its generation capability over 20MP with proper methods. Finally, we propose the refined multi-level captions, covering all tags and various natural language captions as a critical factor for model development. Through extensive analysis and experiments, Illustrious demonstrates state-of-the-art performance in terms of animation style, outperforming widely-used models in illustration domains, propelling easier customization and personalization with nature of open source. We plan to publicly release updated Illustrious model series sequentially as well as sustainable plans for improvements.
MLJan 11, 2023
ODIM: Outlier Detection via Likelihood of Under-Fitted Generative ModelsDongha Kim, Jaesung Hwang, Jongjin Lee et al.
The unsupervised outlier detection (UOD) problem refers to a task to identify inliers given training data which contain outliers as well as inliers, without any labeled information about inliers and outliers. It has been widely recognized that using fully-trained likelihood-based deep generative models (DGMs) often results in poor performance in distinguishing inliers from outliers. In this study, we claim that the likelihood itself could serve as powerful evidence for identifying inliers in UOD tasks, provided that DGMs are carefully under-fitted. Our approach begins with a novel observation called the inlier-memorization (IM) effect-when training a deep generative model with data including outliers, the model initially memorizes inliers before outliers. Based on this finding, we develop a new method called the outlier detection via the IM effect (ODIM). Remarkably, the ODIM requires only a few updates, making it computationally efficient-at least tens of times faster than other deep-learning-based algorithms. Also, the ODIM filters out outliers excellently, regardless of the data type, including tabular, image, and text data. To validate the superiority and efficiency of our method, we provide extensive empirical analyses on close to 60 datasets.
62.1IRMar 18
VLM2Rec: Resolving Modality Collapse in Vision-Language Model Embedders for Multimodal Sequential RecommendationJunyoung Kim, Woojoo Kim, Jaehyung Lim et al.
Sequential Recommendation (SR) in multimodal settings typically relies on small frozen pretrained encoders, which limits semantic capacity and prevents Collaborative Filtering (CF) signals from being fully integrated into item representations. Inspired by the recent success of Large Language Models (LLMs) as high-capacity embedders, we investigate the use of Vision-Language Models (VLMs) as CF-aware multimodal encoders for SR. However, we find that standard contrastive supervised fine-tuning (SFT), which adapts VLMs for embedding generation and injects CF signals, can amplify its inherent modality collapse. In this state, optimization is dominated by a single modality while the other degrades, ultimately undermining recommendation accuracy. To address this, we propose VLM2Rec, a VLM embedder-based framework for multimodal sequential recommendation designed to ensure balanced modality utilization. Specifically, we introduce Weak-modality Penalized Contrastive Learning to rectify gradient imbalance during optimization and Cross-Modal Relational Topology Regularization to preserve geometric consistency between modalities. Extensive experiments demonstrate that VLM2Rec consistently outperforms state-of-the-art baselines in both accuracy and robustness across diverse scenarios.
MLAug 2, 2024
META-ANOVA: Screening interactions for interpretable machine learningYongchan Choi, Seokhun Park, Chanmoo Park et al.
There are two things to be considered when we evaluate predictive models. One is prediction accuracy,and the other is interpretability. Over the recent decades, many prediction models of high performance, such as ensemble-based models and deep neural networks, have been developed. However, these models are often too complex, making it difficult to intuitively interpret their predictions. This complexity in interpretation limits their use in many real-world fields that require accountability, such as medicine, finance, and college admissions. In this study, we develop a novel method called Meta-ANOVA to provide an interpretable model for any given prediction model. The basic idea of Meta-ANOVA is to transform a given black-box prediction model to the functional ANOVA model. A novel technical contribution of Meta-ANOVA is a procedure of screening out unnecessary interaction before transforming a given black-box model to the functional ANOVA model. This screening procedure allows the inclusion of higher order interactions in the transformed functional ANOVA model without computational difficulties. We prove that the screening procedure is asymptotically consistent. Through various experiments with synthetic and real-world datasets, we empirically demonstrate the superiority of Meta-ANOVA
33.1CVMar 28
Let Triggers Control: Frequency-Aware Dropout for Effective Token ControlJunyoung Koh, Hoyeon Moon, Dongha Kim et al.
Text-to-image models such as Stable Diffusion have achieved unprecedented levels of high-fidelity visual synthesis. As these models advance, personalization of generative models -- commonly facilitated through Low-Rank Adaptation (LoRA) with a dedicated trigger token -- has become a significant area of research. Previous works have naively assumed that fine-tuning with a single trigger token to represent new concepts. However, this often results in poor controllability, where the trigger token alone fails to reliably evoke the intended concept. We attribute this issue to the frequent co-occurrence of the trigger token with the surrounding context during fine-tuning, which entangles their representations and compromises the token's semantic distinctiveness. To disentangle this, we propose Frequency-Aware Dropout (FAD) -- a novel regularization technique that improves prompt controllability without adding new parameters. FAD consists of two key components: co-occurrence analysis and curriculum-inspired scheduling. Qualitative and quantitative analyses across token-based diffusion models (SD~1.5 and SDXL) and natural language--driven backbones (FLUX and Qwen-Image) demonstrate consistent gains in prompt fidelity, stylistic precision, and user-perceived quality. Our method provides a simple yet effective dropout strategy that enhances controllability and personalization in text-to-image generation. Notably, it achieves these improvements without introducing additional parameters or architectural modifications, making it readily applicable to existing models with minimal computational overhead.
35.3CLMar 15
Automatic Inter-document Multi-hop Scientific QA GenerationSeungmin Lee, Dongha Kim, Yuni Jeon et al.
Existing automatic scientific question generation studies mainly focus on single-document factoid QA, overlooking the inter-document reasoning crucial for scientific understanding. We present AIM-SciQA, an automated framework for generating multi-document, multi-hop scientific QA datasets. AIM-SciQA extracts single-hop QAs using large language models (LLMs) with machine reading comprehension and constructs cross-document relations based on embedding-based semantic alignment while selectively leveraging citation information. Applied to 8,211 PubMed Central papers, it produced 411,409 single-hop and 13,672 multi-hop QAs, forming the IM-SciQA dataset. Human and automatic validation confirmed high factual consistency, and experimental results demonstrate that IM-SciQA effectively differentiates reasoning capabilities across retrieval and QA stages, providing a realistic and interpretable benchmark for retrieval-augmented scientific reasoning. We further extend this framework to construct CIM-SciQA, a citation-guided variant achieving comparable performance to the Oracle setting, reinforcing the dataset's validity and generality.
MLAug 19, 2024
ALTBI: Constructing Improved Outlier Detection Models via Optimization of Inlier-Memorization EffectSeoyoung Cho, Jaesung Hwang, Kwan-Young Bak et al.
Outlier detection (OD) is the task of identifying unusual observations (or outliers) from a given or upcoming data by learning unique patterns of normal observations (or inliers). Recently, a study introduced a powerful unsupervised OD (UOD) solver based on a new observation of deep generative models, called inlier-memorization (IM) effect, which suggests that generative models memorize inliers before outliers in early learning stages. In this study, we aim to develop a theoretically principled method to address UOD tasks by maximally utilizing the IM effect. We begin by observing that the IM effect is observed more clearly when the given training data contain fewer outliers. This finding indicates a potential for enhancing the IM effect in UOD regimes if we can effectively exclude outliers from mini-batches when designing the loss function. To this end, we introduce two main techniques: 1) increasing the mini-batch size as the model training proceeds and 2) using an adaptive threshold to calculate the truncated loss function. We theoretically show that these two techniques effectively filter out outliers from the truncated loss function, allowing us to utilize the IM effect to the fullest. Coupled with an additional ensemble strategy, we propose our method and term it Adaptive Loss Truncation with Batch Increment (ALTBI). We provide extensive experimental results to demonstrate that ALTBI achieves state-of-the-art performance in identifying outliers compared to other recent methods, even with significantly lower computation costs. Additionally, we show that our method yields robust performances when combined with privacy-preserving algorithms.
MLJan 16
Memorize Early, Then Query: Inlier-Memorization-Guided Active Outlier DetectionMinseo Kang, Seunghwan Park, Dongha Kim
Outlier detection (OD) aims to identify abnormal instances, known as outliers or anomalies, by learning typical patterns of normal data, or inliers. Performing OD under an unsupervised regime-without any information about anomalous instances in the training data-is challenging. A recently observed phenomenon, known as the inlier-memorization (IM) effect, where deep generative models (DGMs) tend to memorize inlier patterns during early training, provides a promising signal for distinguishing outliers. However, existing unsupervised approaches that rely solely on the IM effect still struggle when inliers and outliers are not well-separated or when outliers form dense clusters. To address these limitations, we incorporate active learning to selectively acquire informative labels, and propose IMBoost, a novel framework that explicitly reinforces the IM effect to improve outlier detection. Our method consists of two stages: 1) a warm-up phase that induces and promotes the IM effect, and 2) a polarization phase in which actively queried samples are used to maximize the discrepancy between inlier and outlier scores. In particular, we propose a novel query strategy and tailored loss function in the polarization phase to effectively identify informative samples and fully leverage the limited labeling budget. We provide a theoretical analysis showing that the IMBoost consistently decreases inlier risk while increasing outlier risk throughout training, thereby amplifying their separation. Extensive experiments on diverse benchmark datasets demonstrate that IMBoost not only significantly outperforms state-of-the-art active OD methods but also requires substantially less computational cost.
QUANT-PHMar 28, 2024
Optimizing Quantum Convolutional Neural Network Architectures for Arbitrary Data DimensionChangwon Lee, Israel F. Araujo, Dongha Kim et al.
Quantum convolutional neural networks (QCNNs) represent a promising approach in quantum machine learning, paving new directions for both quantum and classical data analysis. This approach is particularly attractive due to the absence of the barren plateau problem, a fundamental challenge in training quantum neural networks (QNNs), and its feasibility. However, a limitation arises when applying QCNNs to classical data. The network architecture is most natural when the number of input qubits is a power of two, as this number is reduced by a factor of two in each pooling layer. The number of input qubits determines the dimensions (i.e. the number of features) of the input data that can be processed, restricting the applicability of QCNN algorithms to real-world data. To address this issue, we propose a QCNN architecture capable of handling arbitrary input data dimensions while optimizing the allocation of quantum resources such as ancillary qubits and quantum gates. This optimization is not only important for minimizing computational resources, but also essential in noisy intermediate-scale quantum (NISQ) computing, as the size of the quantum circuits that can be executed reliably is limited. Through numerical simulations, we benchmarked the classification performance of various QCNN architectures when handling arbitrary input data dimensions on the MNIST and Breast Cancer datasets. The results validate that the proposed QCNN architecture achieves excellent classification performance while utilizing a minimal resource overhead, providing an optimal solution when reliable quantum computation is constrained by noise and imperfections.
CLMay 15, 2025
Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge GraphDeeksha Prahlad, Chanhee Lee, Dongha Kim et al.
The advent of large language models (LLMs) has allowed numerous applications, including the generation of queried responses, to be leveraged in chatbots and other conversational assistants. Being trained on a plethora of data, LLMs often undergo high levels of over-fitting, resulting in the generation of extra and incorrect data, thus causing hallucinations in output generation. One of the root causes of such problems is the lack of timely, factual, and personalized information fed to the LLM. In this paper, we propose an approach to address these problems by introducing retrieval augmented generation (RAG) using knowledge graphs (KGs) to assist the LLM in personalized response generation tailored to the users. KGs have the advantage of storing continuously updated factual information in a structured way. While our KGs can be used for a variety of frequently updated personal data, such as calendar, contact, and location data, we focus on calendar data in this paper. Our experimental results show that our approach works significantly better in understanding personal information and generating accurate responses compared to the baseline LLMs using personal data as text inputs, with a moderate reduction in response time.
LGAug 6, 2025
Federated Continual RecommendationJaehyung Lim, Wonbin Kweon, Woojoo Kim et al.
The increasing emphasis on privacy in recommendation systems has led to the adoption of Federated Learning (FL) as a privacy-preserving solution, enabling collaborative training without sharing user data. While Federated Recommendation (FedRec) effectively protects privacy, existing methods struggle with non-stationary data streams, failing to maintain consistent recommendation quality over time. On the other hand, Continual Learning Recommendation (CLRec) methods address evolving user preferences but typically assume centralized data access, making them incompatible with FL constraints. To bridge this gap, we introduce Federated Continual Recommendation (FCRec), a novel task that integrates FedRec and CLRec, requiring models to learn from streaming data while preserving privacy. As a solution, we propose F3CRec, a framework designed to balance knowledge retention and adaptation under the strict constraints of FCRec. F3CRec introduces two key components: Adaptive Replay Memory on the client side, which selectively retains past preferences based on user-specific shifts, and Item-wise Temporal Mean on the server side, which integrates new knowledge while preserving prior information. Extensive experiments demonstrate that F3CRec outperforms existing approaches in maintaining recommendation quality over time in a federated environment.
MLJul 11, 2025
MIRRAMS: Learning Robust Tabular Models under Unseen Missingness ShiftsJihye Lee, Minseo Kang, Dongha Kim
The presence of missing values often reflects variations in data collection policies, which may shift across time or locations, even when the underlying feature distribution remains stable. Such shifts in the missingness distribution between training and test inputs pose a significant challenge to achieving robust predictive performance. In this study, we propose a novel deep learning framework designed to address this challenge, particularly in the common yet challenging scenario where the test-time dataset is unseen. We begin by introducing a set of mutual information-based conditions, called MI robustness conditions, which guide the prediction model to extract label-relevant information. This promotes robustness against distributional shifts in missingness at test-time. To enforce these conditions, we design simple yet effective loss terms that collectively define our final objective, called MIRRAMS. Importantly, our method does not rely on any specific missingness assumption such as MCAR, MAR, or MNAR, making it applicable to a broad range of scenarios. Furthermore, it can naturally extend to cases where labels are also missing in training data, by generalizing the framework to a semi-supervised learning setting. Extensive experiments across multiple benchmark tabular datasets demonstrate that MIRRAMS consistently outperforms existing state-of-the-art baselines and maintains stable performance under diverse missingness conditions. Moreover, it achieves superior performance even in fully observed settings, highlighting MIRRAMS as a powerful, off-the-shelf framework for general-purpose tabular learning.
CLOct 30, 2024
Beyond Ontology in Dialogue State Tracking for Goal-Oriented ChatbotSejin Lee, Dongha Kim, Min Song
Goal-oriented chatbots are essential for automating user tasks, such as booking flights or making restaurant reservations. A key component of these systems is Dialogue State Tracking (DST), which interprets user intent and maintains the dialogue state. However, existing DST methods often rely on fixed ontologies and manually compiled slot values, limiting their adaptability to open-domain dialogues. We propose a novel approach that leverages instruction tuning and advanced prompt strategies to enhance DST performance, without relying on any predefined ontologies. Our method enables Large Language Model (LLM) to infer dialogue states through carefully designed prompts and includes an anti-hallucination mechanism to ensure accurate tracking in diverse conversation contexts. Additionally, we employ a Variational Graph Auto-Encoder (VGAE) to model and predict subsequent user intent. Our approach achieved state-of-the-art with a JGA of 42.57% outperforming existing ontology-less DST models, and performed well in open-domain real-world conversations. This work presents a significant advancement in creating more adaptive and accurate goal-oriented chatbots.
MLFeb 7, 2022
Learning fair representation with a parametric integral probability metricDongha Kim, Kunwoong Kim, Insung Kong et al.
As they have a vital effect on social decision-making, AI algorithms should be not only accurate but also fair. Among various algorithms for fairness AI, learning fair representation (LFR), whose goal is to find a fair representation with respect to sensitive variables such as gender and race, has received much attention. For LFR, the adversarial training scheme is popularly employed as is done in the generative adversarial network type algorithms. The choice of a discriminator, however, is done heuristically without justification. In this paper, we propose a new adversarial training scheme for LFR, where the integral probability metric (IPM) with a specific parametric family of discriminators is used. The most notable result of the proposed LFR algorithm is its theoretical guarantee about the fairness of the final prediction model, which has not been considered yet. That is, we derive theoretical relations between the fairness of representation and the fairness of the prediction model built on the top of the representation (i.e., using the representation as the input). Moreover, by numerical experiments, we show that our proposed LFR algorithm is computationally lighter and more stable, and the final prediction model is competitive or superior to other LFR algorithms using more complex discriminators.
LGJun 29, 2021
INN: A Method Identifying Clean-annotated Samples via Consistency Effect in Deep Neural NetworksDongha Kim, Yongchan Choi, Kunwoong Kim et al.
In many classification problems, collecting massive clean-annotated data is not easy, and thus a lot of researches have been done to handle data with noisy labels. Most recent state-of-art solutions for noisy label problems are built on the small-loss strategy which exploits the memorization effect. While it is a powerful tool, the memorization effect has several drawbacks. The performances are sensitive to the choice of a training epoch required for utilizing the memorization effect. In addition, when the labels are heavily contaminated or imbalanced, the memorization effect may not occur in which case the methods based on the small-loss strategy fail to identify clean labeled data. We introduce a new method called INN(Integration with the Nearest Neighborhoods) to refine clean labeled data from training data with noisy labels. The proposed method is based on a new discovery that a prediction pattern at neighbor regions of clean labeled data is consistently different from that of noisy labeled data regardless of training epochs. The INN method requires more computation but is much stable and powerful than the small-loss strategy. By carrying out various experiments, we demonstrate that the INN method resolves the shortcomings in the memorization effect successfully and thus is helpful to construct more accurate deep prediction models with training data with noisy labels.
MLMay 9, 2021
A likelihood approach to nonparametric estimation of a singular distribution using deep generative modelsMinwoo Chae, Dongha Kim, Yongdai Kim et al.
We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.
LGDec 4, 2020
Kernel-convoluted Deep Neural Networks with Data AugmentationMinjin Kim, Young-geun Kim, Dongha Kim et al.
The Mixup method (Zhang et al. 2018), which uses linearly interpolated data, has emerged as an effective data augmentation tool to improve generalization performance and the robustness to adversarial examples. The motivation is to curtail undesirable oscillations by its implicit model constraint to behave linearly at in-between observed data points and promote smoothness. In this work, we formally investigate this premise, propose a way to explicitly impose smoothness constraints, and extend it to incorporate with implicit model constraints. First, we derive a new function class composed of kernel-convoluted models (KCM) where the smoothness constraint is directly imposed by locally averaging the original functions with a kernel function. Second, we propose to incorporate the Mixup method into KCM to expand the domains of smoothness. In both cases of KCM and the KCM adapted with the Mixup, we provide risk analysis, respectively, under some conditions for kernels. We show that the upper bound of the excess risk is not slower than that of the original function class. The upper bound of the KCM with the Mixup remains dominated by that of the KCM if the perturbation of the Mixup vanishes faster than \(O(n^{-1/2})\) where \(n\) is a sample size. Using CIFAR-10 and CIFAR-100 datasets, our experiments demonstrate that the KCM with the Mixup outperforms the Mixup method in terms of generalization and robustness to adversarial examples.
MLSep 15, 2019
Understanding and Improving Virtual Adversarial TrainingDongha Kim, Yongchan Choi, Yongdai Kim
In semi-supervised learning, virtual adversarial training (VAT) approach is one of the most attractive method due to its intuitional simplicity and powerful performances. VAT finds a classifier which is robust to data perturbation toward the adversarial direction. In this study, we provide a fundamental explanation why VAT works well in semi-supervised learning case and propose new techniques which are simple but powerful to improve the VAT method. Especially we employ the idea of Bad GAN approach, which utilizes bad samples distributed on complement of the support of the input data, without any additional deep generative architectures. We generate bad samples of high-quality by use of the adversarial training used in VAT and also give theoretical explanations why the adversarial training is good at both generating bad samples. An advantage of our proposed method is to achieve the competitive performances compared with other recent studies with much fewer computations. We demonstrate advantages our method by various experiments with well known benchmark image datasets.
MLDec 10, 2018
Fast convergence rates of deep neural networks for classificationYongdai Kim, Ilsang Ohn, Dongha Kim
We derive the fast convergence rates of a deep neural network (DNN) classifier with the rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition (i.e., the probability of inputs near the decision boundary is small). We show that the DNN classifier learned using the hinge loss achieves fast rate convergences for all three cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity). is carefully selected. An important implication is that DNN architectures are very flexible for use in various cases without much modification. In addition, we consider a DNN classifier learned by minimizing the cross-entropy, and show that the DNN classifier achieves a fast convergence rate under the condition that the conditional class probabilities of most data are sufficiently close to either 1 or zero. This assumption is not unusual for image recognition because human beings are extremely good at recognizing most images. To confirm our theoretical explanation, we present the results of a small numerical study conducted to compare the hinge loss and cross-entropy.
MLDec 2, 2018
On variation of gradients of deep neural networksYongdai Kim, Dongha Kim
We provide a theoretical explanation of the role of the number of nodes at each layer in deep neural networks. We prove that the largest variation of a deep neural network with ReLU activation function arises when the layer with the fewest nodes changes its activation pattern. An important implication is that deep neural network is a useful tool to generate functions most of whose variations are concentrated on a smaller area of the input space near the boundaries corresponding to the layer with the fewest nodes. In turn, this property makes the function more invariant to input transformation. That is, our theoretical result gives a clue about how to design the architecture of a deep neural network to increase complexity and transformation invariancy simultaneously.