Jaa-Yeon Lee

h-index13

4papers

15citations

Novelty51%

AI Score43

Ranked #80,278 of 201,326 authors (top 40%)#27,582 in CV (top 47%)

4 Papers

LGMay 28

Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models

Jaa-Yeon Lee, Yeobin Hong, Taesung Kwon et al.

Diffusion models generate highly realistic images but often struggle with precise text-image alignment. While recent post-training methods improve alignment using external rewards or human preference signals, their performance heavily depends on reward quality and does not directly address alignment within the diffusion process itself. Recent reward-free approaches such as SoftREPA demonstrate that optimizing soft text tokens via contrastive learning can effectively improve text-image representation alignment, outperforming standard parameter-efficient fine-tuning baselines. However, the contrastive formulation can excessively penalize negative pairs, which manifests as characteristic failure cases such as over-counting and repetition. To address this issue, we propose a lightweight, reward-free post-training method that refines soft tokens by integrating contrastive alignment guidance directly into the score-matching objective of diffusion models. By assigning alignment directions at the score level, our approach mitigates these limitations and yields more coherent and semantically faithful generations. Experiments show that our method matches SoftREPA while substantially improving its failure cases, achieving over 35% improvement in counting accuracy on the GenEval benchmark. Our method is seamlessly applicable to existing diffusion backbones (SD1.5, SDXL, and SD3), and is complementary to existing RL-based diffusion post-training methods. Project page: https://jaayeon.github.io/AGSM

IVMar 16

UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation for Assessing Hepatic Steatosis

Kwanyoung Kim, Jaa-Yeon Lee, Youngjun Ko et al.

Ultrasound imaging is an essential first-line tool for assessing hepatic steatosis. While conventional B-mode ultrasound imaging has limitations in providing detailed tissue characterization, ultrasound Nakagami imaging holds promise for visualizing and quantifying tissue scattering in backscattered signals, with potential applications in fat fraction analysis. However, existing methods for Nakagami imaging struggle with optimal window size selection and suffer from estimator instability, leading to degraded image resolution. To address these challenges, we propose a novel method called UNICORN (Ultrasound Nakagami Imaging via Score Matching and Adaptation), which offers an accurate, closed-form estimator for Nakagami parameter estimation based on the score function of the ultrasound envelope signal. Unlike methods that visualize only specific regions of interest (ROI) and estimate parameters within fixed window sizes, our approach provides comprehensive parameter mapping by providing a pixel-by-pixel estimator, resulting in high-resolution imaging. We demonstrated that our proposed estimator effectively assesses hepatic steatosis and provides visual distinction in the backscattered statistics associated with this condition. Through extensive experiments using real envelope data from patient, we validated that UNICORN enables clinical detection of hepatic steatosis and exhibits robustness and generalizability.

CVMar 11, 2025

Aligning Text to Image in Diffusion Models is Easier Than You Think

Jaa-Yeon Lee, Byunghee Cha, Jeongsol Kim et al.

While recent advancements in generative modeling have significantly improved text-image alignment, some residual misalignment between text and image representations still remains. Some approaches address this issue by fine-tuning models in terms of preference optimization, etc., which require tailored datasets. Orthogonal to these methods, we revisit the challenge from the perspective of representation alignment-an approach that has gained popularity with the success of REPresentation Alignment (REPA). We first argue that conventional text-to-image (T2I) diffusion models, typically trained on paired image and text data (i.e., positive pairs) by minimizing score matching or flow matching losses, is suboptimal from the standpoint of representation alignment. Instead, a better alignment can be achieved through contrastive learning that leverages existing dataset as both positive and negative pairs. To enable efficient alignment with pretrained models, we propose SoftREPA- a lightweight contrastive fine-tuning strategy that leverages soft text tokens for representation alignment. This approach improves alignment with minimal computational overhead by adding fewer than 1M trainable parameters to the pretrained model. Our theoretical analysis demonstrates that our method explicitly increases the mutual information between text and image representations, leading to enhanced semantic consistency. Experimental results across text-to-image generation and text-guided image editing tasks validate the effectiveness of our approach in improving the semantic consistency of T2I generative models.

CVMar 10, 2024

UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation

Kwanyoung Kim, Jaa-Yeon Lee, Jong Chul Ye

Nakagami imaging holds promise for visualizing and quantifying tissue scattering in ultrasound waves, with potential applications in tumor diagnosis and fat fraction estimation which are challenging to discern by conventional ultrasound B-mode images. Existing methods struggle with optimal window size selection and suffer from estimator instability, leading to degraded resolution images. To address this, here we propose a novel method called UNICORN (Ultrasound Nakagami Imaging via Score Matching and Adaptation), that offers an accurate, closed-form estimator for Nakagami parameter estimation in terms of the score function of ultrasonic envelope. Extensive experiments using simulation and real ultrasound RF data demonstrate UNICORN's superiority over conventional approaches in accuracy and resolution quality.