Jaesang Yu

AI
h-index13
3papers
8citations
Novelty65%
AI Score48

3 Papers

HCMay 10
IdeaBlocks: Expressing and Reusing Divergent Intents for Graphic Design Exploration using Generative AI

DaEun Choi, Kihoon Son, Jaesang Yu et al.

While designers increasingly leverage Generative AI for divergent exploration, current interaction is optimized for convergent refinement, forcing users to specify fixed targets rather than open-ended search spaces. Based on a formative study (N=7), we define the anatomy of Divergent Intent, comprising property, direction, and range, and identified two critical barriers: the lack of mechanisms to explicitly shape the parametric boundaries of exploration and the difficulty of reusing successful search strategies. We present IdeaBlocks, where users can modularize divergent intents into Exploration Blocks. Users can reuse prior intents at multiple levels (block, path, and project) with options for literal or context-adaptive reuse. In our comparative study (N=12), participants using IdeaBlocks explored 2.13 times more images with 12.5% greater visual diversity than the baseline, demonstrating how structured intent expression and reuse support effective divergence. A three-day deployment study (N=6) further revealed how different reuse mechanisms allowed distinct creative strategies, offering design implications for future intent-aware creativity supports.

CVMar 26
GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Saelyne Yang, Jaesang Yu, Yi-Hao Peng et al.

Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software (e.g., PowerPoint, Photoshop). While prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm overlooks human intention, where users value the ability to explore, iterate, and refine their ideas while maintaining agency. To move beyond automation and toward collaboration, GUI agents must understand what users are doing and why. We introduce GUIDE (GUI User Intent Detection Evaluation), a benchmark that evaluates AI models on their ability to perceive user behavior, infer intent, and provide assistance in open-ended GUI tasks. GUIDE consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations, across 10 software. GUIDE defines three tasks - (i) Behavior State Detection, (ii) Intent Prediction, and (iii) Help Prediction that test a model's ability to recognize behavior state, reason about goals, and decide when and how to help. Evaluations across eight state-of-the-art multimodal models reveal that all models struggled, achieving only 44.6% and 55.0% accuracy on behavior state and help prediction. However, providing user context significantly improved the performance, raising help prediction by up to 50.2pp, highlighting the critical role of structured user understanding in effective assistance. Our dataset is available at https://guide-bench.github.io.

AIFeb 3
DiscoverLLM: From Executing Intents to Discovering Them

Tae Soo Kim, Yoonjoo Lee, Jaesang Yu et al.

To handle ambiguous and open-ended requests, Large Language Models (LLMs) are increasingly trained to interact with users to surface intents they have not yet expressed (e.g., ask clarification questions). However, users are often ambiguous because they have not yet formed their intents: they must observe and explore outcomes to discover what they want. Simply asking "what kind of tone do you want?" fails when users themselves do not know. We introduce DiscoverLLM, a novel and generalizable framework that trains LLMs to help users form and discover their intents. Central to our approach is a novel user simulator that models cognitive state with a hierarchy of intents that progressively concretize as the model surfaces relevant options -- where the degree of concretization serves as a reward signal that models can be trained to optimize. Resulting models learn to collaborate with users by adaptively diverging (i.e., explore options) when intents are unclear, and converging (i.e., refine and implement) when intents concretize. Across proposed interactive benchmarks in creative writing, technical writing, and SVG drawing, DiscoverLLM achieves over 10% higher task performance while reducing conversation length by up to 40%. In a user study with 75 human participants, DiscoverLLM improved conversation satisfaction and efficiency compared to baselines.