Deunsol Yoon

LG
h-index7
5papers
56citations
Novelty64%
AI Score51

5 Papers

LGFeb 9
CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

Hyungseok Song, Deunsol Yoon, Kanghoon Lee et al.

Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.

AIOct 15, 2024
Unsupervised Training of Diffusion Models for Feasible Solution Generation in Neural Combinatorial Optimization

Seong-Hyun Hong, Hyun-Sung Kim, Zian Jang et al.

Recent advancements in neural combinatorial optimization (NCO) methods have shown promising results in generating near-optimal solutions without the need for expert-crafted heuristics. However, high performance of these approaches often rely on problem-specific human-expertise-based search after generating candidate solutions, limiting their applicability to commonly solved CO problems such as Traveling Salesman Problem (TSP). In this paper, we present IC/DC, an unsupervised CO framework that directly trains a diffusion model from scratch. We train our model in a self-supervised way to minimize the cost of the solution while adhering to the problem-specific constraints. IC/DC is specialized in addressing CO problems involving two distinct sets of items, and it does not need problem-specific search processes to generate valid solutions. IC/DC employs a novel architecture capable of capturing the intricate relationships between items, and thereby enabling effective optimization in challenging CO scenarios. IC/DC achieves state-of-the-art performance relative to existing NCO methods on the Parallel Machine Scheduling Problem (PMSP) and Asymmetric Traveling Salesman Problem (ATSP).

LGJul 11, 2025
Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

Jeonghye Kim, Yongjae Shin, Whiyoung Jung et al.

Reinforcement learning with offline data suffers from Q-value extrapolation errors. To address this issue, we first demonstrate that linear extrapolation of the Q-function beyond the data range is particularly problematic. To mitigate this, we propose guiding the gradual decrease of Q-values outside the data range, which is achieved through reward scaling with layer normalization (RS-LN) and a penalization mechanism for infeasible actions (PA). By combining RS-LN and PA, we develop a new algorithm called PARS. We evaluate PARS across a range of tasks, demonstrating superior performance compared to state-of-the-art algorithms in both offline training and online fine-tuning on the D4RL benchmark, with notable success in the challenging AntMaze Ultra task.

LGJul 11, 2025
Online Pre-Training for Offline-to-Online Reinforcement Learning

Yongjae Shin, Jeonghye Kim, Whiyoung Jung et al.

Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.

LGAug 22, 2018
Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding

Deunsol Yoon, Dongbok Lee, SangKeun Lee

In this paper, we propose Dynamic Self-Attention (DSA), a new self-attention mechanism for sentence embedding. We design DSA by modifying dynamic routing in capsule network (Sabouretal.,2017) for natural language processing. DSA attends to informative words with a dynamic weight vector. We achieve new state-of-the-art results among sentence encoding methods in Stanford Natural Language Inference (SNLI) dataset with the least number of parameters, while showing comparative results in Stanford Sentiment Treebank (SST) dataset.