Jeonghoon Hong

h-index5
2papers

2 Papers

AINov 17, 2025Code
MEGA-GUI: Multi-stage Enhanced Grounding Agents for GUI Elements

SeokJoo Kwak, Jihoon Kim, Boyoun Kim et al.

Graphical User Interface (GUI) grounding - the task of mapping natural language instructions to screen coordinates - is essential for autonomous agents and accessibility technologies. Existing systems rely on monolithic models or one-shot pipelines that lack modularity and fail under visual clutter and ambiguous instructions. We introduce MEGA-GUI, a multi-stage framework that separates grounding into coarse Region-of-Interest (ROI) selection and fine-grained element grounding, orchestrated by specialized vision-language agents. MEGA-GUI features a bidirectional ROI zoom algorithm that mitigates spatial dilution and a context-aware rewriting agent that reduces semantic ambiguity. Our analysis reveals complementary strengths and weaknesses across vision-language models at different visual scales, and we show that leveraging this modular structure achieves consistently higher accuracy than monolithic approaches. On the visually dense ScreenSpot-Pro benchmark, MEGA-GUI attains 73.18% accuracy, and on the semantically complex OSWorld-G benchmark it reaches 68.63%, surpassing previously reported results. Code and the Grounding Benchmark Toolkit (GBT) are available at https://github.com/samsungsds-research-papers/mega-gui.

QUANT-PHDec 3, 2024
Reinforcement learning to learn quantum states for Heisenberg scaling accuracy

Jeongwoo Jae, Jeonghoon Hong, Jinho Choo et al.

Learning quantum states is a crucial task for realizing quantum information technology. Recently, neural approaches have emerged as promising methods for learning quantum states. We propose a meta-learning model that utilizes reinforcement learning (RL) to optimize the process of learning quantum states. To improve the data efficiency of the RL, we introduce an action repetition strategy inspired by curriculum learning. The RL agent significantly improves the sample efficiency of learning random quantum states, and achieves infidelity scaling close to the Heisenberg limit. We also show that the RL agent trained using 3-qubit states can generalize to learning up to 5-qubit states. These results highlight the utility of RL-driven meta-learning to enhance the efficiency and generalizability of learning quantum states. Our approach can be applied to improve quantum control, quantum optimization, and quantum machine learning.