Soobin Um

LG
h-index12
7papers
73citations
Novelty66%
AI Score60

7 Papers

LGJan 29, 2023Code
Don't Play Favorites: Minority Guidance for Diffusion Models

Soobin Um, Suhyeon Lee, Jong Chul Ye

We explore the problem of generating minority samples using diffusion models. The minority samples are instances that lie on low-density regions of a data manifold. Generating a sufficient number of such minority instances is important, since they often contain some unique attributes of the data. However, the conventional generation process of the diffusion models mostly yields majority samples (that lie on high-density regions of the manifold) due to their high likelihoods, making themselves ineffective and time-consuming for the minority generating task. In this work, we present a novel framework that can make the generation process of the diffusion models focus on the minority samples. We first highlight that Tweedie's denoising formula yields favorable results for majority samples. The observation motivates us to introduce a metric that describes the uniqueness of a given sample. To address the inherent preference of the diffusion models w.r.t. the majority samples, we further develop minority guidance, a sampling technique that can guide the generation process toward regions with desired likelihood levels. Experiments on benchmark real datasets demonstrate that our minority guidance can greatly improve the capability of generating high-quality minority samples over existing generative samplers. We showcase that the performance benefit of our framework persists even in demanding real-world scenarios such as medical imaging, further underscoring the practical significance of our work. Code is available at https://github.com/soobin-um/minority-guidance.

LGMay 23Code
Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion

Sol Park, Soobin Um

Minority sampling aims to generate low-density instances on a data manifold and is of central importance in applications such as medical diagnosis, anomaly detection, and creative AI. Existing approaches, however, define minority samples relative to generative priors learned from training data, confining rarity to model-specific notions that may poorly reflect real-world semantics. In this work, we propose a world-centric perspective on minority sampling, which defines rarity with respect to real-world priors rather than generator-induced densities. To this end, we introduce JEPA guidance, a diffusion sampling framework guided by a Joint-Embedding Predictive Architecture (JEPA) -- a class of world models that encode broad, semantically rich representations. JEPA guidance steers diffusion trajectories toward low-density regions under the implicit density induced by the JEPA, thereby aligning generated minorities with real-world semantic rarity. To make JEPA guidance computationally practical, we develop principled approximation strategies accompanied by theoretical error bounds, significantly reducing the overhead of guidance computation. Extensive experiments across unconditional, class-conditional, and text-to-image generation demonstrate that JEPA guidance consistently improves the fidelity and semantic validity of minority samples, outperforming generator-centric baselines in capturing real-world notions of rarity. Code is available at https://github.com/soobin-um/jepa-guidance.

CVJul 16, 2024Code
Self-Guided Generation of Minority Samples Using Diffusion Models

Soobin Um, Jong Chul Ye

We present a novel approach for generating minority samples that live on low-density regions of a data manifold. Our framework is built upon diffusion models, leveraging the principle of guided sampling that incorporates an arbitrary energy-based guidance during inference time. The key defining feature of our sampler lies in its \emph{self-contained} nature, \ie, implementable solely with a pretrained model. This distinguishes our sampler from existing techniques that require expensive additional components (like external classifiers) for minority generation. Specifically, we first estimate the likelihood of features within an intermediate latent sample by evaluating a reconstruction loss w.r.t. its posterior mean. The generation then proceeds with the minimization of the estimated likelihood, thereby encouraging the emergence of minority features in the latent samples of subsequent timesteps. To further improve the performance of our sampler, we provide several time-scheduling techniques that properly manage the influence of guidance over inference steps. Experiments on benchmark real datasets demonstrate that our approach can greatly improve the capability of creating realistic low-likelihood minority instances over the existing techniques without the reliance on costly additional elements. Code is available at \url{https://github.com/soobin-um/sg-minority}.

OPTICSApr 23, 2025Code
Physics-guided and fabrication-aware inverse design of photonic devices using diffusion models

Dongjin Seo, Soobin Um, Sangbin Lee et al.

Designing free-form photonic devices is fundamentally challenging due to the vast number of possible geometries and the complex requirements of fabrication constraints. Traditional inverse-design approaches--whether driven by human intuition, global optimization, or adjoint-based gradient methods--often involve intricate binarization and filtering steps, while recent deep learning strategies demand prohibitively large numbers of simulations (10^5 to 10^6). To overcome these limitations, we present AdjointDiffusion, a physics-guided framework that integrates adjoint sensitivity gradients into the sampling process of diffusion models. AdjointDiffusion begins by training a diffusion network on a synthetic, fabrication-aware dataset of binary masks. During inference, we compute the adjoint gradient of a candidate structure and inject this physics-based guidance at each denoising step, steering the generative process toward high figure-of-merit (FoM) solutions without additional post-processing. We demonstrate our method on two canonical photonic design problems--a bent waveguide and a CMOS image sensor color router--and show that our method consistently outperforms state-of-the-art nonlinear optimizers (such as MMA and SLSQP) in both efficiency and manufacturability, while using orders of magnitude fewer simulations (approximately 2 x 10^2) than pure deep learning approaches (approximately 10^5 to 10^6). By eliminating complex binarization schedules and minimizing simulation overhead, AdjointDiffusion offers a streamlined, simulation-efficient, and fabrication-aware pipeline for next-generation photonic device design. Our open-source implementation is available at https://github.com/dongjin-seo2020/AdjointDiffusion.

LGFeb 10, 2025Code
Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation

Soobin Um, Beomsu Kim, Jong Chul Ye

Minority samples are underrepresented instances located in low-density regions of a data manifold, and are valuable in many generative AI applications, such as data augmentation, creative content generation, etc. Unfortunately, existing diffusion-based minority generators often rely on computationally expensive guidance dedicated for minority generation. To address this, here we present a simple yet powerful guidance-free approach called Boost-and-Skip for generating minority samples using diffusion models. The key advantage of our framework requires only two minimal changes to standard generative processes: (i) variance-boosted initialization and (ii) timestep skipping. We highlight that these seemingly-trivial modifications are supported by solid theoretical and empirical evidence, thereby effectively promoting emergence of underrepresented minority features. Our comprehensive experiments demonstrate that Boost-and-Skip greatly enhances the capability of generating minority samples, even rivaling guidance-based state-of-the-art approaches while requiring significantly fewer computations. Code is available at https://github.com/soobin-um/BnS.

CVMar 14
MotionCFG: Boosting Motion Dynamics via Stochastic Concept Perturbation

Byungjun Kim, Soobin Um, Jong Chul Ye

Despite recent advances in Text-to-Video (T2V) synthesis, generating high-fidelity and dynamic motion remains a significant challenge. Existing methods primarily rely on Classifier-Free Guidance (CFG), often with explicit negative prompts (e.g. "static", "blurry"), to suppress undesired artifacts. However, such explicit negations frequently introduce unintended semantic bias and distort object integrity; a phenomenon we define as Content-Motion Drift. To address this, we propose MotionCFG, a framework that enhances motion dynamics by contrasting a target concept with its noise-perturbed counterparts. Specifically, by injecting Gaussian noise into the concept embeddings, MotionCFG creates localized negative anchors that encapsulate a broad complementary space of sub-optimal motion variations. Unlike explicit negations, this approach facilitates implicit hard negative mining without shifting the global semantic identity, allowing for a focused refinement of temporal details. Combined with a piecewise guidance schedule that confines intervention to the early denoising steps, MotionCFG consistently improves motion dynamics across state-of-the-art T2V frameworks with negligible computational overhead and minimal compromise in visual quality. Additionally, we demonstrate that this noise-induced contrastive mechanism is effective not only for sharpening motion trajectories but also for steering complex, non-linear concepts such as precise object numerosity, which are typically difficult to modulate via standard text-based guidance.

GROct 4, 2025
Diverse Text-to-Image Generation via Contrastive Noise Optimization

Byungjun Kim, Soobin Um, Jong Chul Ye

Text-to-image (T2I) diffusion models have demonstrated impressive performance in generating high-fidelity images, largely enabled by text-guided inference. However, this advantage often comes with a critical drawback: limited diversity, as outputs tend to collapse into similar modes under strong text guidance. Existing approaches typically optimize intermediate latents or text conditions during inference, but these methods deliver only modest gains or remain sensitive to hyperparameter tuning. In this work, we introduce Contrastive Noise Optimization, a simple yet effective method that addresses the diversity issue from a distinct perspective. Unlike prior techniques that adapt intermediate latents, our approach shapes the initial noise to promote diverse outputs. Specifically, we develop a contrastive loss defined in the Tweedie data space and optimize a batch of noise latents. Our contrastive optimization repels instances within the batch to maximize diversity while keeping them anchored to a reference sample to preserve fidelity. We further provide theoretical insights into the mechanism of this preprocessing to substantiate its effectiveness. Extensive experiments across multiple T2I backbones demonstrate that our approach achieves a superior quality-diversity Pareto frontier while remaining robust to hyperparameter choices.