Dohyun Kim

CV
h-index6
20papers
370citations
Novelty49%
AI Score58

20 Papers

96.7LGMay 26Code
Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference

Jaewoo Lee, Hyeongyu Kang, Dohyun Kim et al.

Aligning a few-step generative model is challenging, since existing alignment frameworks typically rely on restrictive assumptions: a tractable likelihood, a specific ODE/SDE solver, or a particular model family. We introduce FAV, Few-step Generative Models Alignment via Sample-based Variational Inference, a general alignment framework that requires only sample access to the generator and the reference distribution. We cast alignment as sampling from a reward-tilted distribution anchored to a reference distribution. We leverage Stein Variational Gradient Descent as a sample-based variational inference scheme and amortize its particle updates into the generator parameters via fixed-point regression. We evaluate FAV on two domains: robotics manipulation and image generator alignment. On generative policy alignment for robotic manipulation, FAV outperforms prevailing policy extraction baselines across 56 offline and 30 offline-to-online RL tasks. For image generator alignment, FAV fine-tunes diverse few-step backbones, including GAN, drifting model, consistency models, and flow maps, scaling from ImageNet-$256$ to 1024$^2$ text-to-image synthesis. Code is available at https://github.com/Jaewoopudding/FAV.

79.1CLMay 23Code
Measuring the Depth of LLM Unlearning via Activation Patching

Jaeung Lee, Dohyun Kim, Jaemin Jo

Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail to detect when this knowledge remains recoverable from internal representations. Recent white-box studies reveal such residual knowledge but often rely on auxiliary training or dataset-specific adaptations, leaving no generalizable metric. To address these limitations, we propose the Unlearning Depth Score (UDS), a metric that quantifies the mechanistic depth of unlearning via activation patching. UDS first identifies layers that encode the target knowledge using a retain model baseline, then measures how much of it is erased in the unlearned model on a 0-1 scale. In a meta-evaluation across 20 metrics on 150 unlearned models spanning 8 methods, UDS achieves the highest faithfulness and robustness, confirming our causal approach as the most reliable for unlearning evaluation. Case studies further reveal that white-box metrics can disagree at the layer level and that erasure depth varies across examples. We provide guidelines for integrating UDS into existing benchmarking frameworks and streamlining the evaluation pipeline. Code and data are available at https://github.com/gnueaj/unlearning-depth-score

IVAug 20, 2024
ISLES'24: Final Infarct Prediction with Multimodal Imaging and Clinical Data. Where Do We Stand?

Ezequiel de la Rosa, Ruisheng Su, Mauricio Reyes et al.

Accurate estimation of brain infarction (i.e., irreversibly damaged tissue) is critical for guiding treatment decisions in acute ischemic stroke. Reliable infarct prediction informs key clinical interventions, including the need for patient transfer to comprehensive stroke centers, the potential benefit of additional reperfusion attempts during mechanical thrombectomy, decisions regarding secondary neuroprotective treatments, and ultimately, prognosis of clinical outcomes. This work introduces the Ischemic Stroke Lesion Segmentation (ISLES) 2024 challenge, which focuses on the prediction of final infarct volumes from pre-interventional acute stroke imaging and clinical data. ISLES24 provides a comprehensive, multimodal setting where participants can leverage all clinically and practically available data, including full acute CT imaging, sub-acute follow-up MRI, and structured clinical information, across a train set of 150 cases. On the hidden test set of 98 cases, the top-performing model, a multimodal nnU-Net-based architecture, achieved a Dice score of 0.285 (+/- 0.213) and an absolute volume difference of 21.2 (+/- 37.2) mL, underlining the significant challenges posed by this task and the need for further advances in multimodal learning. This work makes two primary contributions: first, we establish a standardized, clinically realistic benchmark for post-treatment infarct prediction, enabling systematic evaluation of multimodal algorithmic strategies on a longitudinal stroke dataset; second, we analyze current methodological limitations and outline key research directions to guide the development of next-generation infarct prediction models.

CVSep 26, 2022
Improving Multi-fidelity Optimization with a Recurring Learning Rate for Hyperparameter Tuning

HyunJae Lee, Gihyeon Lee, Junhwan Kim et al.

Despite the evolution of Convolutional Neural Networks (CNNs), their performance is surprisingly dependent on the choice of hyperparameters. However, it remains challenging to efficiently explore large hyperparameter search space due to the long training times of modern CNNs. Multi-fidelity optimization enables the exploration of more hyperparameter configurations given budget by early termination of unpromising configurations. However, it often results in selecting a sub-optimal configuration as training with the high-performing configuration typically converges slowly in an early phase. In this paper, we propose Multi-fidelity Optimization with a Recurring Learning rate (MORL) which incorporates CNNs' optimization process into multi-fidelity optimization. MORL alleviates the problem of slow-starter and achieves a more precise low-fidelity approximation. Our comprehensive experiments on general image classification, transfer learning, and semi-supervised learning demonstrate the effectiveness of MORL over other multi-fidelity optimization methods such as Successive Halving Algorithm (SHA) and Hyperband. Furthermore, it achieves significant performance improvements over hand-tuned hyperparameter configuration within a practical budget.

CVMay 25, 2022
Cross-Domain Style Mixing for Face Cartoonization

Seungkwon Kim, Chaeheon Gwak, Dohyun Kim et al.

Cartoon domain has recently gained increasing popularity. Previous studies have attempted quality portrait stylization into the cartoon domain; however, this poses a great challenge since they have not properly addressed the critical constraints, such as requiring a large number of training images or the lack of support for abstract cartoon faces. Recently, a layer swapping method has been used for stylization requiring only a limited number of training images; however, its use cases are still narrow as it inherits the remaining issues. In this paper, we propose a novel method called Cross-domain Style mixing, which combines two latent codes from two different domains. Our method effectively stylizes faces into multiple cartoon characters at various face abstraction levels using only a single generator without even using a large number of training images.

CVDec 29, 2025Code
Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision

Dohyun Kim, Seungwoo Lyu, Seung Wook Kim et al.

Diffusion models have achieved impressive results in generative tasks such as text-to-image synthesis, yet they often struggle to fully align outputs with nuanced user intent and maintain consistent aesthetic quality. Existing preference-based training methods like Diffusion Direct Preference Optimization help address these issues but rely on costly and potentially noisy human-labeled datasets. In this work, we introduce Direct Diffusion Score Preference Optimization (DDSPO), which directly derives per-timestep supervision from winning and losing policies when such policies are available. Unlike prior methods that operate solely on final samples, DDSPO provides dense, transition-level signals across the denoising trajectory. In practice, we avoid reliance on labeled data by automatically generating preference signals using a pretrained reference model: we contrast its outputs when conditioned on original prompts versus semantically degraded variants. This practical strategy enables effective score-space preference supervision without explicit reward modeling or manual annotations. Empirical results demonstrate that DDSPO improves text-image alignment and visual quality, outperforming or matching existing preference-based methods while requiring significantly less supervision. Our implementation is available at: https://dohyun-as.github.io/DDSPO

87.6CLApr 8Code
Raon-Speech Technical Report

Beomsoo Kim, Changho Choi, Dohyun Kim et al.

We present Raon-Speech, a top-performing 9B-parameter speech language model (SpeechLM) for English and Korean speech understanding, answering, and generation, and Raon-SpeechChat, a high-performing full-duplex extension for natural real-time conversation. Raon-Speech successfully transforms a pre-trained LLM into a SpeechLM that both understands and generates speech while preserving strong text capabilities. It trains on 1.38M hours of highly curated English and Korean speech and text datasets with the following training stages: (1) speech modules alignment, (2) end-to-end SpeechLM pre-training with knowledge distillation, and (3) multi-task preference optimization-based post-training. Across 42 English and Korean speech and text benchmarks, Raon-Speech establishes the strongest overall profile on speech-centric tasks in our comparison against eight similarly sized recent audio foundation models, including Qwen2.5-Omni and Fun-Audio-Chat, while preserving strong text question answering performance. Building upon it, Raon-SpeechChat enables natural full-duplex conversation by continual training on 119K hours of time-aligned real and synthetic dialogue data. It proceeds through three complementary training stages: (1) causal encoder adaptation, (2) full-duplex pre-training, (3) full-duplex fine-tuning for voice and role-control. On multiple full-duplex benchmarks, Raon-SpeechChat shows its clearest strengths on the turn-taking and interruption-sensitive behaviors covered by FDB v1.0, and remains competitive across the broader full-duplex evaluation suite. We open-source all model checkpoints, the training and inference pipeline, and an interactive demo.

93.8CLApr 9
EXAONE 4.5 Technical Report

Eunbi Choi, Kibong Choi, Sehyun Chun et al.

This technical report introduces EXAONE 4.5, the first open-weight vision language model released by LG AI Research. EXAONE 4.5 is architected by integrating a dedicated visual encoder into the existing EXAONE 4.0 framework, enabling native multimodal pretraining over both visual and textual modalities. The model is trained on large-scale data with careful curation, particularly emphasizing document-centric corpora that align with LG's strategic application domains. This targeted data design enables substantial performance gains in document understanding and related tasks, while also delivering broad improvements across general language capabilities. EXAONE 4.5 extends context length up to 256K tokens, facilitating long-context reasoning and enterprise-scale use cases. Comparative evaluations demonstrate that EXAONE 4.5 achieves competitive performance in general benchmarks while outperforming state-of-the-art models of similar scale in document understanding and Korean contextual reasoning. As part of LG's ongoing effort toward practical industrial deployment, EXAONE 4.5 is designed to be continuously extended with additional domains and application scenarios to advance AI for a better life.

72.9NAMay 12
The SiMPL Method for Multi-Material Topology Optimization

Peter Gangl, Brendan Keith, Dohyun Kim et al.

We introduce an efficient and scalable method for density-based multi-material topology optimization, integrating classical mirror descent techniques with point-wise polytopal design constraints. Such constraints arise naturally in this class of problems, wherein the vertices of convex polytopes correspond to distinct design states, only one of which should be occupied at each point in space. The framework generates a descending sequence of iterates by penalizing the design space around the previous iterate with a generalized distance function tailored to the convex geometry of the $n$-dimensional polytope. This distance function, called a Bregman divergence, smooths the optimization landscape, ensuring that each iterate strictly satisfies the point-wise constraints. Subsequently, global constraints (e.g., bounds on the structural mass) can be enforced easily by solving a small, finite-dimensional dual problem. The resulting method is simple to implement and demonstrates robustness and efficiency when combined with an Armijo-type line search algorithm. We validate the method in structural design problems involving the optimal arrangement of both isotropic and anisotropic materials, as well as magnetic flux optimization in electric motors.

71.1NAApr 21
Proximal Discontinuous Galerkin Methods for Variational Inequalities

Alexandre Ern, Brendan Keith, Dohyun Kim et al.

We introduce a family of proximal discontinuous Galerkin methods for variational inequalities, focusing on the obstacle problem as a didactic example. Each member of this family is born from applying a different well-known nonconforming finite element discretization to the Bregman proximal point method. We explicitly treat four examples: the symmetric interior penalty discontinuous Galerkin, the enriched Galerkin, the hybridizable interior penalty and the hybrid high-order methods. We formulate a unified analysis framework for this family of methods and prove the existence and uniqueness of solutions, energy dissipation, and error estimates for both the primal and dual variables. Remarkably, the proximal hybrid high-order method with piecewise constant cell unknowns and piecewise affine cell unknowns leads to the first higher-order convergence result for any proximal Galerkin method.

ROFeb 2, 2024
LINGO-Space: Language-Conditioned Incremental Grounding for Space

Dohyun Kim, Nayoung Oh, Deokmin Hwang et al.

We aim to solve the problem of spatially localizing composite instructions referring to space: space grounding. Compared to current instance grounding, space grounding is challenging due to the ill-posedness of identifying locations referred to by discrete expressions and the compositional ambiguity of referring expressions. Therefore, we propose a novel probabilistic space-grounding methodology (LINGO-Space) that accurately identifies a probabilistic distribution of space being referred to and incrementally updates it, given subsequent referring expressions leveraging configurable polar distributions. Our evaluations show that the estimation using polar distributions enables a robot to ground locations successfully through $20$ table-top manipulation benchmark tests. We also show that updating the distribution helps the grounding method accurately narrow the referring space. We finally demonstrate the robustness of the space grounding with simulated manipulation and real quadruped robot navigation tasks. Code and videos are available at https://lingo-space.github.io.

LGNov 4, 2024
Learning from Convolution-based Unlearnable Datasets

Dohyun Kim, Pedro Sandoval-Segura

The construction of large datasets for deep learning has raised concerns regarding unauthorized use of online data, leading to increased interest in protecting data from third-parties who want to use it for training. The Convolution-based Unlearnable DAtaset (CUDA) method aims to make data unlearnable by applying class-wise blurs to every image in the dataset so that neural networks learn relations between blur kernels and labels, as opposed to informative features for classifying clean data. In this work, we evaluate whether CUDA data remains unlearnable after image sharpening and frequency filtering, finding that this combination of simple transforms improves the utility of CUDA data for training. In particular, we observe a substantial increase in test accuracy over adversarial training for models trained with CUDA unlearnable data from CIFAR-10, CIFAR-100, and ImageNet-100. In training models to high accuracy using unlearnable data, we underscore the need for ongoing refinement in data poisoning techniques to ensure data privacy. Our method opens new avenues for enhancing the robustness of unlearnable datasets by highlighting that simple methods such as sharpening and frequency filtering are capable of breaking convolution-based unlearnable datasets.

CVNov 25, 2025
Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos

Youngseo Kim, Dohyun Kim, Geonhee Han et al.

Image diffusion models, though originally developed for image generation, implicitly capture rich semantic structures that enable various recognition and localization tasks beyond synthesis. In this work, we investigate their self-attention maps can be reinterpreted as semantic label propagation kernels, providing robust pixel-level correspondences between relevant image regions. Extending this mechanism across frames yields a temporal propagation kernel that enables zero-shot object tracking via segmentation in videos. We further demonstrate the effectiveness of test-time optimization strategies-DDIM inversion, textual inversion, and adaptive head weighting-in adapting diffusion features for robust and consistent label propagation. Building on these findings, we introduce DRIFT, a framework for object tracking in videos leveraging a pretrained image diffusion model with SAM-guided mask refinement, achieving state-of-the-art zero-shot performance on standard video object segmentation benchmarks.

ROMay 13, 2025
Reinforcement Learning-based Fault-Tolerant Control for Quadrotor with Online Transformer Adaptation

Dohyun Kim, Jayden Dongwoo Lee, Hyochoong Bang et al.

Multirotors play a significant role in diverse field robotics applications but remain highly susceptible to actuator failures, leading to rapid instability and compromised mission reliability. While various fault-tolerant control (FTC) strategies using reinforcement learning (RL) have been widely explored, most previous approaches require prior knowledge of the multirotor model or struggle to adapt to new configurations. To address these limitations, we propose a novel hybrid RL-based FTC framework integrated with a transformer-based online adaptation module. Our framework leverages a transformer architecture to infer latent representations in real time, enabling adaptation to previously unseen system models without retraining. We evaluate our method in a PyBullet simulation under loss-of-effectiveness actuator faults, achieving a 95% success rate and a positional root mean square error (RMSE) of 0.129 m, outperforming existing adaptation methods with 86% success and an RMSE of 0.153 m. Further evaluations on quadrotors with varying configurations confirm the robustness of our framework across untrained dynamics. These results demonstrate the potential of our framework to enhance the adaptability and reliability of multirotors, enabling efficient fault management in dynamic and uncertain environments. Website is available at http://00dhkim.me/paper/rl-ftc

LGApr 2, 2025
Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression

Dohyun Kim, Sehwan Park, Geonhee Han et al.

Diffusion models generate high-quality images through progressive denoising but are computationally intensive due to large model sizes and repeated sampling. Knowledge distillation, which transfers knowledge from a complex teacher to a simpler student model, has been widely studied in recognition tasks, particularly for transferring concepts unseen during student training. However, its application to diffusion models remains underexplored, especially in enabling student models to generate concepts not covered by the training images. In this work, we propose Random Conditioning, a novel approach that pairs noised images with randomly selected text conditions to enable efficient, image-free knowledge distillation. By leveraging this technique, we show that the student can generate concepts unseen in the training images. When applied to conditional diffusion model distillation, our method allows the student to explore the condition space without generating condition-specific images, resulting in notable improvements in both generation quality and efficiency. This promotes resource-efficient deployment of generative diffusion models, broadening their accessibility for both research and real-world applications. Code, models, and datasets are available at https://dohyun-as.github.io/Random-Conditioning .

ARJan 12, 2025
COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators

Jihoon Park, Jeongin Choe, Dohyun Kim et al.

Recently, crossbar array based in-memory accelerators have been gaining interest due to their high throughput and energy efficiency. While software and compiler support for the in-memory accelerators has also been introduced, they are currently limited to the case where all weights are assumed to be on-chip. This limitation becomes apparent with the significantly increasing network sizes compared to the in-memory footprint. Weight replacement schemes are essential to address this issue. We propose COMPASS, a compiler framework for resource-constrained crossbar-based processing-in-memory (PIM) deep neural network (DNN) accelerators. COMPASS is specially targeted for networks that exceed the capacity of PIM crossbar arrays, necessitating access to external memories. We propose an algorithm to determine the optimal partitioning that divides the layers so that each partition can be accelerated on chip. Our scheme takes into account the data dependence between layers, core utilization, and the number of write instructions to minimize latency, memory accesses, and improve energy efficiency. Simulation results demonstrate that COMPASS can accommodate much more networks using a minimal memory footprint, while improving throughput by 1.78X and providing 1.28X savings in energy-delay product (EDP) over baseline partitioning methods.

SYJan 12, 2020
Self-Driving like a Human driver instead of a Robocar: Personalized comfortable driving experience for autonomous vehicles

Il Bae, Jaeyoung Moon, Junekyo Jhung et al.

This paper issues an integrated control system of self-driving autonomous vehicles based on the personal driving preference to provide personalized comfortable driving experience to autonomous vehicle users. We propose an Occupant's Preference Metric (OPM) which is defining a preferred lateral and longitudinal acceleration region with maximum allowable jerk for users. Moreover, we propose a vehicle controller based on control parameters enabling integrated lateral and longitudinal control via preference-aware maneuvering of autonomous vehicles. The proposed system not only provides the criteria for the occupant's driving preference, but also provides a personalized autonomous self-driving style like a human driver instead of a Robocar. The simulation and experimental results demonstrated that the proposed system can maneuver the self-driving vehicle like a human driver by tracking the specified criterion of admissible acceleration and jerk.

HCDec 28, 2019
Real World Longitudinal iOS App Usage Study at Scale

Dohyun Kim, Joshua Gluck, Malcolm Hall et al.

Given the importance of understanding the interaction between mobile devices and their users, app usage patterns have been studied in various contexts. However, prior work has not fully investigated longitudinal changes to app usage behavior. In this paper, we present a longitudinal, large-scale study of mobile app usage based on a dataset collected from 162,006 iPhones and iPads over 4 years. We explore multiple dimensions of app usage pattern proving useful insights on how app usage changes over time. Our key findings include (i) app usage pattern changes over time both at the individual app level and the app category level (i.e. proportion of time a user spends using an app), (ii) users keep a small set of apps frequently launched (90% of iPhone users launch roughly 14-18 apps weekly), (iii) a small number of apps remain popular while some specific kinds of apps (e.g. Games) have a shorter life cycle compared to other apps of different categories. Finally, we discuss our findings and their implications, for example, a short-term study as an attempt to understand the general needs of mobile devices may not achieve useful results for the long term.

CVAug 10, 2019
Deep ensemble network with explicit complementary model for accuracy-balanced classification

Dohyun Kim, Kyeorye Lee, Jiyeon Kim et al.

The average accuracy is one of major evaluation metrics for classification systems, while the accuracy deviation is another important performance metric used to evaluate various deep neural networks. In this paper, we present a new ensemble-like fast deep neural network, Harmony, that can reduce the accuracy deviation among categories without degrading overall average accuracy. Harmony consists of three sub-models, namely, Target model, Complementary model, and Conductor model. In Harmony, an object is classified by using either Target model or Complementary model. Target model is a conventional classification network for general categories, while Complementary model is a classification network especially for weak categories that are inaccurately classified by Target model. Conductor model is used to select one of two models. Experimental results demonstrate that Harmony accurately classifies categories, while it reduces the accuracy deviation among the categories.

CRAug 31, 2017
Be Selfish and Avoid Dilemmas: Fork After Withholding (FAW) Attacks on Bitcoin

Yujin Kwon, Dohyun Kim, Yunmok Son et al.

In the Bitcoin system, participants are rewarded for solving cryptographic puzzles. In order to receive more consistent rewards over time, some participants organize mining pools and split the rewards from the pool in proportion to each participant's contribution. However, several attacks threaten the ability to participate in pools. The block withholding (BWH) attack makes the pool reward system unfair by letting malicious participants receive unearned wages while only pretending to contribute work. When two pools launch BWH attacks against each other, they encounter the miner's dilemma: in a Nash equilibrium, the revenue of both pools is diminished. In another attack called selfish mining, an attacker can unfairly earn extra rewards by deliberately generating forks. In this paper, we propose a novel attack called a fork after withholding (FAW) attack. FAW is not just another attack. The reward for an FAW attacker is always equal to or greater than that for a BWH attacker, and it is usable up to four times more often per pool than in BWH attack. When considering multiple pools - the current state of the Bitcoin network - the extra reward for an FAW attack is about 56% more than that for a BWH attack. Furthermore, when two pools execute FAW attacks on each other, the miner's dilemma may not hold: under certain circumstances, the larger pool can consistently win. More importantly, an FAW attack, while using intentional forks, does not suffer from practicality issues, unlike selfish mining. We also discuss partial countermeasures against the FAW attack, but finding a cheap and efficient countermeasure remains an open problem. As a result, we expect to see FAW attacks among mining pools.