Sanghyeon Lee

CV
h-index44
13papers
59citations
Novelty55%
AI Score56

13 Papers

CVJul 14, 2022
iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Jooyeol Yun, Sanghyeon Lee, Minho Park et al.

Point-interactive image colorization aims to colorize grayscale images when a user provides the colors for specific locations. It is essential for point-interactive colorization methods to appropriately propagate user-provided colors (i.e., user hints) in the entire image to obtain a reasonably colorized image with minimal user effort. However, existing approaches often produce partially colorized results due to the inefficient design of stacking convolutional layers to propagate hints to distant relevant regions. To address this problem, we present iColoriT, a novel point-interactive colorization Vision Transformer capable of propagating user hints to relevant regions, leveraging the global receptive field of Transformers. The self-attention mechanism of Transformers enables iColoriT to selectively colorize relevant regions with only a few local hints. Our approach colorizes images in real-time by utilizing pixel shuffling, an efficient upsampling technique that replaces the decoder architecture. Also, in order to mitigate the artifacts caused by pixel shuffling with large upsampling ratios, we present the local stabilizing layer. Extensive quantitative and qualitative results demonstrate that our approach highly outperforms existing methods for point-interactive colorization, producing accurately colorized images with a user's minimal effort. Official codes are available at https://pmh9960.github.io/research/iColoriT

40.4AIMay 18
LLM-Guided Communication for Cooperative Multi-Agent Reinforcement Learning

Sangjun Bae, Yisak Park, Sanghyeon Lee et al.

Communication is a key component in multi-agent reinforcement learning (MARL) for mitigating partial observability, yet prior approaches often rely on inefficient information exchange or fail to transmit sufficient state information. To address this, we propose LLM-driven Multi-Agent Communication (LMAC), which leverages an LLM's reasoning capability to design a communication protocol that enables all agents to reconstruct the underlying state as accurately and uniformly as possible. LMAC iteratively refines the protocol using an explicit state-awareness criterion, improving state recovery while narrowing differences in agents' knowledge. Experiments on diverse MARL benchmarks show that LMAC improves state reconstruction across agents and yields substantial performance gains over prior communication baselines.

61.2CVMar 29
OPRO: Orthogonal Panel-Relative Operators for Panel-Aware In-Context Image Generation

Sanghyeon Lee, Minwoo Lee, Euijin Shin et al.

We introduce a parameter-efficient adaptation method for panel-aware in-context image generation with pre-trained diffusion transformers. The key idea is to compose learnable, panel-specific orthogonal operators onto the backbone's frozen positional encodings. This design provides two desirable properties: (1) isometry, which preserves the geometry of internal features, and (2) same-panel invariance, which maintains the model's pre-trained intra-panel synthesis behavior. Through controlled experiments, we demonstrate that the effectiveness of our adaptation method is not tied to a specific positional encoding design but generalizes across diverse positional encoding regimes. By enabling effective panel-relative conditioning, the proposed method consistently improves in-context image-based instructional editing pipelines, including state-of-the-art approaches.

67.5AIApr 1
Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling

Hongbeen Kim, Juhyun Lee, Sanghyeon Lee et al.

Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reasoning accuracy.

39.4CVMay 12
PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition

Sanghyeon Lee, Jinwoo Kim, Jong Taek Lee

Zero-shot skeleton-based action recognition (ZSSAR) is typically treated as a skeleton-text alignment problem: encode joint-coordinate sequences, align them with language, and classify unseen actions. We argue that this alignment is often too late. Skeletons are not complete action observations, but compressed outputs of human pose estimation (HPE); by the time alignment begins, human-object interactions and pose-relative visual cues may no longer be explicit. We call this upstream semantic loss. To address it, we propose PoseBridge, an HPE-aware ZSSAR framework that bridges intermediate HPE representations to skeleton-text alignment. Rather than adding an RGB action branch or object detector, PoseBridge extracts pose-anchored semantic cues from the same HPE process that produces skeletons, then transfers them through skeleton-conditioned bridging and semantic prototype adaptation. Across NTU-RGB+D 60/120, PKU-MMD, and Kinetics-200/400, PoseBridge improves ZSSAR performance under the evaluated protocols. On the Kinetics-200/400 PURLS benchmark, which contains in-the-wild videos with diverse scenes and action contexts, PoseBridge shows the clearest separation, improving the strongest compared baseline by 13.3-17.4 points across all eight splits. Our code will be publicly released.

CVNov 17, 2024
Anomaly Detection for People with Visual Impairments Using an Egocentric 360-Degree Camera

Inpyo Song, Sanghyeon Lee, Minjun Joo et al.

Recent advancements in computer vision have led to a renewed interest in developing assistive technologies for individuals with visual impairments. Although extensive research has been conducted in the field of computer vision-based assistive technologies, most of the focus has been on understanding contexts in images, rather than addressing their physical safety and security concerns. To address this challenge, we propose the first step towards detecting anomalous situations for visually impaired people by observing their entire surroundings using an egocentric 360-degree camera. We first introduce a novel egocentric 360-degree video dataset called VIEW360 (Visually Impaired Equipped with Wearable 360-degree camera), which contains abnormal activities that visually impaired individuals may encounter, such as shoulder surfing and pickpocketing. Furthermore, we propose a new architecture called the FDPN (Frame and Direction Prediction Network), which facilitates frame-level prediction of abnormal events and identifying of their directions. Finally, we evaluate our approach on our VIEW360 dataset and the publicly available UCF-Crime and Shanghaitech datasets, demonstrating state-of-the-art performance.

LGMay 23, 2024
Exclusively Penalized Q-learning for Offline Reinforcement Learning

Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim et al.

Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods

LGJun 26, 2025
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim et al.

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.

LGFeb 6, 2025
Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning

Sanghyeon Lee, Sangjun Bae, Yisak Park et al.

Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, leading to unstable skill learning and degraded performance. To address this, we propose Self-Improving Skill Learning (SISL), which performs self-guided skill refinement using decoupled high-level and skill improvement policies, while applying skill prioritization via maximum return relabeling to focus updates on task-relevant trajectories, resulting in robust and stable adaptation even under noisy and suboptimal data. By mitigating the effect of noise, SISL achieves reliable skill learning and consistently outperforms other skill-based meta-RL methods on diverse long-horizon tasks.

CVDec 18, 2024
Enabling Region-Specific Control via Lassos in Point-Based Colorization

Sanghyeon Lee, Jooyeol Yun, Jaegul Choo

Point-based interactive colorization techniques allow users to effortlessly colorize grayscale images using user-provided color hints. However, point-based methods often face challenges when different colors are given to semantically similar areas, leading to color intermingling and unsatisfactory results-an issue we refer to as color collapse. The fundamental cause of color collapse is the inadequacy of points for defining the boundaries for each color. To mitigate color collapse, we introduce a lasso tool that can control the scope of each color hint. Additionally, we design a framework that leverages the user-provided lassos to localize the attention masks. The experimental results show that using a single lasso is as effective as applying 4.18 individual color hints and can achieve the desired outcomes in 30% less time than using points alone.

CRAug 26, 2021
Stockade: Hardware Hardening for Distributed Trusted Sandboxes

Joongun Park, Seunghyo Kang, Sanghyeon Lee et al.

The widening availability of hardware-based trusted execution environments (TEEs) has been accelerating the adaptation of new applications using TEEs. Recent studies showed that a cloud application consists of multiple distributed software modules provided by mutually distrustful parties. The applications use multiple TEEs (enclaves) communicating through software-encrypted memory channels. Such execution model requires bi-directional protection: protecting the rest of the system from the enclave module with sandboxing and protecting the enclave module from a third-part module and operating systems. However, the current TEE model, such as Intel SGX, cannot efficiently represent such distributed sandbox applications. To overcome the lack of hardware supports for sandboxed TEEs, this paper proposes an extended enclave model called Stockade, which supports distributed sandboxes hardened by hardware. Stockade proposes new three key techniques. First, it extends the hardware-based memory isolation in SGX to confine a user software module only within its enclave. Second, it proposes a trusted monitor enclave that filters and validates systems calls from enclaves. Finally, it allows hardware-protected memory sharing between a pair of enclaves for efficient protected communication without software-based encryption. Using an emulated SGX platform with the proposed extensions, this paper shows that distributed sandbox applications can be effectively supported with small changes of SGX hardware.

DCAug 25, 2021
Hardware-assisted Trusted Memory Disaggregation for Secure Far Memory

Taekyung Heo, Seunghyo Kang, Sanghyeon Lee et al.

Memory disaggregation provides efficient memory utilization across network-connected systems. It allows a node to use part of memory in remote nodes in the same cluster. Recent studies have improved RDMA-based memory disaggregation systems, supporting lower latency and higher bandwidth than the prior generation of disaggregated memory. However, the current disaggregated memory systems manage remote memory only at coarse granularity due to the limitation of the access validation mechanism of RDMA. In such systems, to support fine-grained remote page allocation, the trustworthiness of all participating systems needs to be assumed, and thus a security breach in a node can propagate to the entire cluster. From the security perspective, the memory-providing node must protect its memory from memory-requesting nodes. On the other hand, the memory-requesting node requires the confidentiality and integrity protection of its memory contents even if they are stored in remote nodes. To address the weak isolation support in the current system, this study proposes a novel hardware-assisted memory disaggregation system. Based on the security features of FPGA, the logic in each per-node FPGA board provides a secure memory disaggregation engine. With its own networks, a set of FPGA-based engines form a trusted memory disaggregation system, which is isolated from the privileged software of each participating node. The secure memory disaggregation system allows fine-grained memory management in memory-providing nodes, while the access validation is guaranteed with the hardware-hardened mechanism. In addition, the proposed system hides the memory access patterns observable from remote nodes, supporting obliviousness. Our evaluation with FPGA implementation shows that such fine-grained secure disaggregated memory is feasible with comparable performance to the latest software-based techniques.

CVJul 4, 2021
Deep Edge-Aware Interactive Colorization against Color-Bleeding Effects

Eungyeup Kim, Sanghyeon Lee, Jeonghoon Park et al.

Deep neural networks for automatic image colorization often suffer from the color-bleeding artifact, a problematic color spreading near the boundaries between adjacent objects. Such color-bleeding artifacts debase the reality of generated outputs, limiting the applicability of colorization models in practice. Although previous approaches have attempted to address this problem in an automatic manner, they tend to work only in limited cases where a high contrast of gray-scale values are given in an input image. Alternatively, leveraging user interactions would be a promising approach for solving this color-breeding artifacts. In this paper, we propose a novel edge-enhancing network for the regions of interest via simple user scribbles indicating where to enhance. In addition, our method requires a minimal amount of effort from users for their satisfactory enhancement. Experimental results demonstrate that our interactive edge-enhancing approach effectively improves the color-bleeding artifacts compared to the existing baselines across various datasets.