Yong Suk Choi

CL
h-index1
8papers
1,014citations
Novelty49%
AI Score56

8 Papers

32.7CVMar 10
LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control

Mingyu Kang, Hyein Seo, Yuna Jeong et al.

Recent advances in text-to-image generation have been remarkable, but generating multilingual design logos that harmoniously integrate visual and textual elements remains a challenging task. Existing methods often distort character geometry when applying creative styles and struggle to support multilingual text generation without additional training. To address these challenges, we propose LogoDiffuser, a training-free method that synthesizes multilingual logo designs using the multimodal diffusion transformer. Instead of using textual prompts, we input the target characters as images, enabling robust character structure control regardless of language. We first analyze the joint attention mechanism to identify core tokens, which are tokens that strongly respond to textual structures. With this observation, our method integrates character structure and visual design by injecting the most informative attention maps. Furthermore, we perform layer-wise aggregation of attention maps to mitigate attention shifts across layers and obtain consistent core tokens. Extensive experiments and user studies demonstrate that our method achieves state-of-the-art performance in multilingual logo generation.

CLMay 17, 2023Code
Balancing Lexical and Semantic Quality in Abstractive Summarization

Jeewoo Sul, Yong Suk Choi

An important problem of the sequence-to-sequence neural models widely used in abstractive summarization is exposure bias. To alleviate this problem, re-ranking systems have been applied in recent years. Despite some performance improvements, this approach remains underexplored. Previous works have mostly specified the rank through the ROUGE score and aligned candidate summaries, but there can be quite a large gap between the lexical overlap metric and semantic similarity. In this paper, we propose a novel training method in which a re-ranker balances the lexical and semantic quality. We further newly define false positives in ranking and present a strategy to reduce their influence. Experiments on the CNN/DailyMail and XSum datasets show that our method can estimate the meaning of summaries without seriously degrading the lexical aspect. More specifically, it achieves an 89.67 BERTScore on the CNN/DailyMail dataset, reaching new state-of-the-art performance. Our code is publicly available at https://github.com/jeewoo1025/BalSum.

CLSep 9, 2021Code
Graph Based Network with Contextualized Representations of Turns in Dialogue

Bongseok Lee, Yong Suk Choi

Dialogue-based relation extraction (RE) aims to extract relation(s) between two arguments that appear in a dialogue. Because dialogues have the characteristics of high personal pronoun occurrences and low information density, and since most relational facts in dialogues are not supported by any single sentence, dialogue-based relation extraction requires a comprehensive understanding of dialogue. In this paper, we propose the TUrn COntext awaRE Graph Convolutional Network (TUCORE-GCN) modeled by paying attention to the way people understand dialogues. In addition, we propose a novel approach which treats the task of emotion recognition in conversations (ERC) as a dialogue-based RE. Experiments on a dialogue-based RE dataset and three ERC datasets demonstrate that our model is very effective in various dialogue-based natural language understanding tasks. In these experiments, TUCORE-GCN outperforms the state-of-the-art models on most of the benchmark datasets. Our code is available at https://github.com/BlackNoodle/TUCORE-GCN.

53.3AIApr 7
Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models

Keuntae Kim, Mingyu Kang, Yong Suk Choi

Diffusion large language models (dLLMs) are emerging as promising alternatives to autoregressive (AR) LLMs. Recently, this paradigm has been extended to multimodal tasks, leading to the development of diffusion multimodal large language models (dMLLMs). These models are expected to retain the reasoning capabilities of LLMs while enabling faster inference through parallel generation. However, when combined with Chain-of-Thought (CoT) reasoning, dMLLMs exhibit two critical issues. First, we observe that dMLLMs often generate the final answer token at a very early timestep. This trend indicates that the model determines the answer before sufficient reasoning, leading to degraded reasoning performance. Second, during the initial timesteps, dMLLMs show minimal dependency on visual prompts, exhibiting a fundamentally different pattern of visual information utilization compared to AR vision-language models. In summary, these findings indicate that dMLLMs tend to generate premature final answers without sufficiently grounding on visual inputs. To address these limitations, we propose Position and Step Penalty (PSP) and Visual Reasoning Guidance (VRG). PSP penalizes tokens in later positions during early timesteps, delaying premature answer generation and encouraging progressive reasoning across timesteps. VRG, inspired by classifier-free guidance, amplifies visual grounding signals to enhance the model's alignment with visual evidence. Extensive experiments across various dMLLMs demonstrate that our method achieves up to 7.5% higher accuracy while delivering more than 3x speedup compared to reasoning with four times more diffusion steps.

CLOct 20, 2025
CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections

Keuntae Kim, Eunhye Jeong, Sehyeon Lee et al.

Recent advances in enhancing the reasoning ability of large language models (LLMs) have been remarkably successful. LLMs trained with reinforcement learning (RL) for reasoning demonstrate strong performance in challenging tasks such as mathematics and coding, even with relatively small model sizes. However, despite these improvements in task accuracy, the assessment of creativity in LLM generations has been largely overlooked in reasoning tasks, in contrast to writing tasks. The lack of research on creativity assessment in reasoning primarily stems from two challenges: (1) the difficulty of defining the range of creativity, and (2) the necessity of human evaluation in the assessment process. To address these challenges, we propose CLAWS, a method that defines and classifies mathematical solutions into typical, creative, and hallucinated categories without human evaluation, by leveraging attention weights across prompt sections and output. CLAWS outperforms five existing white-box detection methods (Perplexity, Logit Entropy, Window Entropy, Hidden Score, and Attention Score) on five 7-8B math RL models (DeepSeek, Qwen, Mathstral, OpenMath2, and Oreal). We validate CLAWS on 4545 math problems collected from 181 math contests (AJHSME, AMC, AIME).

CVSep 30, 2025
Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation

Mingyu Kang, Yong Suk Choi

Text-to-image diffusion models have achieved remarkable success in generating high-quality and diverse images. Building on these advancements, diffusion models have also demonstrated exceptional performance in text-guided image editing. A key strategy for effective image editing involves inverting the source image into editable noise maps associated with the target image. However, previous inversion methods face challenges in adhering closely to the target text prompt. The limitation arises because inverted noise maps, while enabling faithful reconstruction of the source image, restrict the flexibility needed for desired edits. To overcome this issue, we propose Editable Noise Map Inversion (ENM Inversion), a novel inversion technique that searches for optimal noise maps to ensure both content preservation and editability. We analyze the properties of noise maps for enhanced editability. Based on this analysis, our method introduces an editable noise refinement that aligns with the desired edits by minimizing the difference between the reconstructed and edited noise maps. Extensive experiments demonstrate that ENM Inversion outperforms existing approaches across a wide range of image editing tasks in both preservation and edit fidelity with target prompts. Our approach can also be easily applied to video editing, enabling temporal consistency and content manipulation across frames.

CLJan 23, 2019
Semantic Relation Classification via Bidirectional LSTM Networks with Entity-aware Attention using Latent Entity Typing

Joohong Lee, Sangwoo Seo, Yong Suk Choi

Classifying semantic relations between entity pairs in sentences is an important task in Natural Language Processing (NLP). Most previous models for relation classification rely on the high-level lexical and syntactic features obtained by NLP tools such as WordNet, dependency parser, part-of-speech (POS) tagger, and named entity recognizers (NER). In addition, state-of-the-art neural models based on attention mechanisms do not fully utilize information of entity that may be the most crucial features for relation classification. To address these issues, we propose a novel end-to-end recurrent neural model which incorporates an entity-aware attention mechanism with a latent entity typing (LET) method. Our model not only utilizes entities and their latent types as features effectively but also is more interpretable by visualizing attention mechanisms applied to our model and results of LET. Experimental results on the SemEval-2010 Task 8, one of the most popular relation classification task, demonstrate that our model outperforms existing state-of-the-art models without any high-level features.

CLJan 23, 2019
A Tool for Spatio-Temporal Analysis of Social Anxiety with Twitter Data

Joohong Lee, Dongyoung Son, Yong Suk Choi

In this paper, we present a tool for analyzing spatio-temporal distribution of social anxiety. Twitter, one of the most popular social network services, has been chosen as data source for analysis of social anxiety. Tweets (posted on the Twitter) contain various emotions and thus these individual emotions reflect social atmosphere and public opinion, which are often dependent on spatial and temporal factors. The reason why we choose anxiety among various emotions is that anxiety is very important emotion that is useful for observing and understanding social events of communities. We develop a machine learning based tool to analyze the changes of social atmosphere spatially and temporally. Our tool classifies whether each Tweet contains anxious content or not, and also estimates degree of Tweet anxiety. Furthermore, it also visualizes spatio-temporal distribution of anxiety as a form of web application, which is incorporated with physical map, word cloud, search engine and chart viewer. Our tool is applied to a big tweet data in South Korea to illustrate its usefulness for exploring social atmosphere and public opinion spatio-temporally.