Dooyoung Kim

CL
h-index10
7papers
41citations
Novelty46%
AI Score50

7 Papers

CVFeb 3
SceneLinker: Compositional 3D Scene Generation via Semantic Scene Graph from RGB Sequences

Seok-Young Kim, Dooyoung Kim, Woojin Cho et al.

We introduce SceneLinker, a novel framework that generates compositional 3D scenes via semantic scene graph from RGB sequences. To adaptively experience Mixed Reality (MR) content based on each user's space, it is essential to generate a 3D scene that reflects the real-world layout by compactly capturing the semantic cues of the surroundings. Prior works struggled to fully capture the contextual relationship between objects or mainly focused on synthesizing diverse shapes, making it challenging to generate 3D scenes aligned with object arrangements. We address these challenges by designing a graph network with cross-check feature attention for scene graph prediction and constructing a graph-variational autoencoder (graph-VAE), which consists of a joint shape and layout block for 3D scene generation. Experiments on the 3RScan/3DSSG and SG-FRONT datasets demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations, even in complex indoor environments and under challenging scene graph constraints. Our work enables users to generate consistent 3D spaces from their physical environments via scene graphs, allowing them to create spatial MR content. Project page is https://scenelinker2026.github.io.

CVMar 9
Int3DNet: Scene-Motion Cross Attention Network for 3D Intention Prediction in Mixed Reality

Taewook Ha, Woojin Cho, Dooyoung Kim et al.

We propose Int3DNet, a scene-aware network that predicts 3D intention areas directly from scene geometry and head-hand motion cues, enabling robust human intention prediction without explicit object-level perception. In Mixed Reality (MR), intention prediction is critical as it enables the system to anticipate user actions and respond proactively, reducing interaction delays and ensuring seamless user experiences. Our method employs a cross attention fusion of sparse motion cues and scene point clouds, offering a novel approach that directly interprets the user's spatial intention within the scene. We evaluated Int3DNet on MoGaze and CIRCLE datasets, which are public datasets for full-body human-scene interactions, showing consistent performance across time horizons of up to 1500 ms and outperforming the baselines, even in diverse and unseen scenes. Moreover, we demonstrate the usability of proposed method through a demonstration of efficient visual question answering (VQA) based on intention areas. Int3DNet provides reliable 3D intention areas derived from head-hand motion and scene geometry, thus enabling seamless interaction between humans and MR systems through proactive processing of intention areas.

CLNov 3, 2025
ECO Decoding: Entropy-Based Control for Controllability and Fluency in Controllable Dialogue Generation

Seungmin Shin, Dooyoung Kim, Youngjoong Ko

Controllable Dialogue Generation (CDG) enables chatbots to generate responses with desired attributes, and weighted decoding methods have achieved significant success in the CDG task. However, using a fixed constant value to manage the bias of attribute probabilities makes it challenging to find an ideal control strength that satisfies both controllability and fluency. To address this issue, we propose ECO decoding (Entropy-based COntrol), which dynamically adjusts the control strength at each generation step according to the model's entropy in both the language model and attribute classifier probability distributions. Experiments on the DailyDialog and MultiWOZ datasets demonstrate that ECO decoding consistently improves controllability while maintaining fluency and grammaticality, outperforming prior decoding methods across various models and settings. Furthermore, ECO decoding alleviates probability interpolation issues in multi-attribute generation and consequently demonstrates strong performance in both single and multi-attribute scenarios.

HCMar 8
Task Breakpoint Generation using Origin-Centric Graph in Virtual Reality Recordings for Adaptive Playback

Selin Choi, Dooyoung Kim, Taewook Ha et al.

We propose a method for generating task breakpoints based on an Origin-Centric Graph (OCG) to segment goal-oriented activity recordings into task units for adaptive playback in Virtual Reality (VR) environments. With the development of Augmented Reality (AR)/VR head-mounted displays (HMDs), research on adaptive tutorials and authoring tools has become active, but existing task segmentation methods mainly rely on manual annotation or are restricted to 2D video which limits their applicability to 3D VR contexts. In our approach, assembly scenarios with clearly defined task boundaries are recorded using a structured spatio-temporal scene graph (STSG), and the OCG is employed to track changes in the central object and the formation of new groups, thereby generating task breakpoints automatically. A user study collected user-perceived task breakpoints to establish ground truth (GT), and comparison with the algorithm-detected breakpoints demonstrated high agreement and confirmed accuracy in supporting adaptive playback. The proposed task segmentation method provides a foundation for dynamically adjusting VR playback according to user proficiency and progress, with potential for extension into automatic timeline segmentation systems for diverse VR recordings.

CLMar 17, 2025
DAPI: Domain Adaptive Toxicity Probe Vector Intervention for Fine-Grained Detoxification

Cho Hyeonsu, Dooyoung Kim, Youngjoong Ko

There have been attempts to utilize linear probe for detoxification, with existing studies relying on a single toxicity probe vector to reduce toxicity. However, toxicity can be fine-grained into various subcategories, making it difficult to remove certain types of toxicity by using a single toxicity probe vector. To address this limitation, we propose a category-specific toxicity probe vector approach. First, we train multiple toxicity probe vectors for different toxicity categories. During generation, we dynamically select the most relevant toxicity probe vector based on the current context. Finally, the selected vector is dynamically scaled and subtracted from model. Our method successfully mitigated toxicity from categories that the single probe vector approach failed to detoxify. Experiments demonstrate that our approach achieves up to a 78.52% reduction in toxicity on the evaluation dataset, while fluency remains nearly unchanged, with only a 0.052% drop compared to the unsteered model.

LGJun 2, 2024
Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction

Yunhyeok Kwak, Inwoo Hwang, Dooyoung Kim et al.

Monte Carlo Tree Search (MCTS) has showcased its efficacy across a broad spectrum of decision-making problems. However, its performance often degrades under vast combinatorial action space, especially where an action is composed of multiple sub-actions. In this work, we propose an action abstraction based on the compositional structure between a state and sub-actions for improving the efficiency of MCTS under a factored action space. Our method learns a latent dynamics model with an auxiliary network that captures sub-actions relevant to the transition on the current state, which we call state-conditioned action abstraction. Notably, it infers such compositional relationships from high-dimensional observations without the known environment model. During the tree traversal, our method constructs the state-conditioned action abstraction for each node on-the-fly, reducing the search space by discarding the exploration of redundant sub-actions. Experimental results demonstrate the superior sample efficiency of our method compared to vanilla MuZero, which suffers from expansive action space.

HCJan 12, 2022
Effects of Virtual Room Size and Objects on Relative Translation Gain Thresholds in Redirected Walking

Dooyoung Kim, Jinwook Kim, Jae-eun Shin et al.

This paper investigates how the size of virtual space and objects within it affect the threshold range of relative translation gains, a Redirected Walking (RDW) technique that scales the user's movement in virtual space in different ratios for the width and depth. While previous studies assert that a virtual room's size affects relative translation gain thresholds on account of the virtual horizon's location, additional research is needed to explore this assumption through a structured approach to visual perception in Virtual Reality (VR). We estimate the relative translation gain thresholds in six spatial conditions configured by three room sizes and the presence of virtual objects (3 X 2), which were set according to differing Angles of Declination (AoDs) between eye-gaze and the forward-gaze. Results show that both size and virtual objects significantly affect the threshold range, it being greater in the large-sized condition and furnished condition. This indicates that the effect of relative translation gains can be further increased by constructing a perceived virtual movable space that is even larger than the adjusted virtual movable space and placing objects in it. Our study can be applied to adjust virtual spaces in synchronizing heterogeneous spaces without coordinate distortion where real and virtual objects can be leveraged to create realistic mutual spaces.