Bongwon Suh

HC
h-index5
9papers
103citations
Novelty46%
AI Score49

9 Papers

IRMar 26, 2022
Data Augmentation Strategies for Improving Sequential Recommender Systems

Joo-yeong Song, Bongwon Suh

Sequential recommender systems have recently achieved significant performance improvements with the exploitation of deep learning (DL) based methods. However, although various DL-based methods have been introduced, most of them only focus on the transformations of network structure, neglecting the importance of other influential factors including data augmentation. Obviously, DL-based models require a large amount of training data in order to estimate parameters well and achieve high performances, which leads to the early efforts to increase the training data through data augmentation in computer vision and speech domains. In this paper, we seek to figure out that various data augmentation strategies can improve the performance of sequential recommender systems, especially when the training dataset is not large enough. To this end, we propose a simple set of data augmentation strategies, all of which transform original item sequences in the way of direct corruption and describe how data augmentation changes the performance. Extensive experiments on the latest DL-based model show that applying data augmentation can help the model generalize better, and it can be significantly effective to boost model performances especially when the amount of training data is small. Furthermore, it is shown that our proposed strategies can improve performances to a better or competitive level to existing strategies suggested in the prior works.

CLJun 28, 2025Code
DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues

Kyochul Jang, Donghyeon Lee, Kyusik Kim et al.

Existing function-calling benchmarks focus on single-turn interactions. However, they overlook the complexity of real-world scenarios. To quantify how existing benchmarks address practical applications, we introduce DICE-SCORE, a metric that evaluates the dispersion of tool-related information such as function name and parameter values throughout the dialogue. Analyzing existing benchmarks through DICE-SCORE reveals notably low scores, highlighting the need for more realistic scenarios. To address this gap, we present DICE-BENCH, a framework that constructs practical function-calling datasets by synthesizing conversations through a tool graph that maintains dependencies across rounds and a multi-agent system with distinct personas to enhance dialogue naturalness. The final dataset comprises 1,607 high-DICE-SCORE instances. Our experiments on 19 LLMs with DICE-BENCH show that significant advances are still required before such models can be deployed effectively in real-world settings. Our code and data are all publicly available: https://snuhcc.github.io/DICE-Bench/.

MAMar 2, 2025Code
LLMDR: LLM-Driven Deadlock Detection and Resolution in Multi-Agent Pathfinding

Seungbae Seo, Junghwan Kim, Minjeong Shin et al.

Multi-Agent Pathfinding (MAPF) is a core challenge in multi-agent systems. Existing learning-based MAPF methods often struggle with scalability, particularly when addressing complex scenarios that are prone to deadlocks. To address these challenges, we introduce LLMDR (LLM-Driven Deadlock Detection and Resolution), an approach designed to resolve deadlocks and improve the performance of learnt MAPF models. LLMDR integrates the inference capabilities of large language models (LLMs) with learnt MAPF models and prioritized planning, enabling it to detect deadlocks and provide customized resolution strategies. We evaluate LLMDR on standard MAPF benchmark maps with varying agent numbers, measuring its performance when combined with several base models. The results demonstrate that LLMDR improves the performance of learnt MAPF models, particularly in deadlock-prone scenarios, with notable improvements in success rates. These findings show the potential of integrating LLMs to improve the scalability of learning-based MAPF methods. The source code for LLMDR is available at: https://github.com/ssbacc/llmdr-dhc

CYOct 4, 2023
Evaluating and Improving Value Judgments in AI: A Scenario-Based Study on Large Language Models' Depiction of Social Conventions

Jaeyoun You, Bongwon Suh

The adoption of generative AI technologies is swiftly expanding. Services employing both linguistic and mul-timodal models are evolving, offering users increasingly precise responses. Consequently, human reliance on these technologies is expected to grow rapidly. With the premise that people will be impacted by the output of AI, we explored approaches to help AI output produce better results. Initially, we evaluated how contemporary AI services competitively meet user needs, then examined society's depiction as mirrored by Large Language Models (LLMs). We did a query experiment, querying about social conventions in various countries and eliciting a one-word response. We compared the LLMs' value judgments with public data and suggested an model of decision-making in value-conflicting scenarios which could be adopted for future machine value judgments. This paper advocates for a practical approach to using AI as a tool for investigating other remote worlds. This re-search has significance in implicitly rejecting the notion of AI making value judgments and instead arguing a more critical perspective on the environment that defers judgmental capabilities to individuals. We anticipate this study will empower anyone, regardless of their capacity, to receive safe and accurate value judgment-based out-puts effectively.

HCMar 13
"I Should Know, But I Dare Not Ask": From Understanding Challenges in Patient Journeys to Deriving Design Implications for North Korean Defectors' Adaptation

Hyungwoo Song, Jeongha Kim, Minju Kim et al.

While it is known that North Korean defectors (NKDs) struggle with South Korea's healthcare system, the specific challenges of their patient journey remain underexplored. To investigate this, we conducted interviews with 10 NKDs about an 8-step patient journey and identified the clinical consultation step as a critical barrier for all participants, marked by three key challenges: expressing symptoms, managing social and cultural concerns, and overcoming language differences. In response, we developed Medibridge, a mobile prototype that allows users to rehearse with an AI doctor before a real hospital visit to generate a tangible ``Helper Note'' for their actual consultation. Our evaluation with 15 NKDs showed improvements in perceived communication capability, including greater expression clarity, reduced social and cultural concerns, and enhanced linguistic confidence. Our contributions include an empirical understanding of NKDs' healthcare challenges, a novel AI-powered rehearsal system that prepares users for real-world clinical communication, and design implications for inclusive technologies for displaced populations.

CLJun 17, 2025
MAS-LitEval : Multi-Agent System for Literary Translation Quality Assessment

Junghwan Kim, Kieun Park, Sohee Park et al.

Literary translation requires preserving cultural nuances and stylistic elements, which traditional metrics like BLEU and METEOR fail to assess due to their focus on lexical overlap. This oversight neglects the narrative consistency and stylistic fidelity that are crucial for literary works. To address this, we propose MAS-LitEval, a multi-agent system using Large Language Models (LLMs) to evaluate translations based on terminology, narrative, and style. We tested MAS-LitEval on translations of The Little Prince and A Connecticut Yankee in King Arthur's Court, generated by various LLMs, and compared it to traditional metrics. \textbf{MAS-LitEval} outperformed these metrics, with top models scoring up to 0.890 in capturing literary nuances. This work introduces a scalable, nuanced framework for Translation Quality Assessment (TQA), offering a practical tool for translators and researchers.

HCSep 2, 2021
Applying the Persona of User's Family Member and the Doctor to the Conversational Agents for Healthcare

Youjin Hwang, Donghoon Shin, Sion Baek et al.

Conversational agents have been showing lots of opportunities in healthcare by taking over a lot of tasks that used to be done by a human. One of the major functions of conversational healthcare agent is intervening users' daily behaviors. In this case, forming an intimate and trustful relationship with users is one of the major issues. Factors affecting human-agent relationship should be deeply explored to improve long-term acceptance of healthcare agent. Even though a bunch of ideas and researches have been suggested to increase the acceptance of conversational agents in healthcare, challenges still remain. From the preliminary work we conducted, we suggest an idea of applying the personas of users' family members and the doctor who are in the relationship with users in the real world as a solution for forming the rigid relationship between humans and the chatbot.

MLNov 3, 2019
Enhancing VAEs for Collaborative Filtering: Flexible Priors & Gating Mechanisms

Daeryong Kim, Bongwon Suh

Neural network based models for collaborative filtering have started to gain attention recently. One branch of research is based on using deep generative models to model user preferences where variational autoencoders were shown to produce state-of-the-art results. However, there are some potentially problematic characteristics of the current variational autoencoder for CF. The first is the too simplistic prior that VAEs incorporate for learning the latent representations of user preference. The other is the model's inability to learn deeper representations with more than one hidden layer for each network. Our goal is to incorporate appropriate techniques to mitigate the aforementioned problems of variational autoencoder CF and further improve the recommendation performance. Our work is the first to apply flexible priors to collaborative filtering and show that simple priors (in original VAEs) may be too restrictive to fully model user preferences and setting a more flexible prior gives significant gains. We experiment with the VampPrior, originally proposed for image generation, to examine the effect of flexible priors in CF. We also show that VampPriors coupled with gating mechanisms outperform SOTA results including the Variational Autoencoder for Collaborative Filtering by meaningful margins on 2 popular benchmark datasets (MovieLens & Netflix).

HCAug 1, 2016
Exploring the Front Touch Interface for Virtual Reality Headsets

Jihyun Lee, Byungmoon Kim, Bongwon Suh et al.

In this paper, we propose a new interface for virtual reality headset: a touchpad in front of the headset. To demonstrate the feasibility of the front touch interface, we built a prototype device, explored VR UI design space expansion, and performed various user studies. We started with preliminary tests to see how intuitively and accurately people can interact with the front touchpad. Then, we further experimented various user interfaces such as a binary selection, a typical menu layout, and a keyboard. Two-Finger and Drag-n-Tap were also explored to find the appropriate selection technique. As a low-cost, light-weight, and in low power budget technology, a touch sensor can make an ideal interface for mobile headset. Also, front touch area can be large enough to allow wide range of interaction types such as multi-finger interactions. With this novel front touch interface, we paved a way to new virtual reality interaction methods.