Yongsung Kim

CV
h-index8
5papers
11citations
Novelty36%
AI Score41

5 Papers

48.0CVMar 26Code
HeSS: Head Sensitivity Score for Sparsity Redistribution in VGGT

Yongsung Kim, Wooseok Song, Jaihyun Lew et al.

Visual Geometry Grounded Transformer (VGGT) has advanced 3D vision, yet its global attention layers suffer from quadratic computational costs that hinder scalability. Several sparsification-based acceleration techniques have been proposed to alleviate this issue, but they often suffer from substantial accuracy degradation. We hypothesize that the accuracy degradation stems from the heterogeneity in head-wise sparsification sensitivity, as the existing methods apply a uniform sparsity pattern across all heads. Motivated by this hypothesis, we present a two-stage sparsification pipeline that effectively quantifies and exploits headwise sparsification sensitivity. In the first stage, we measure head-wise sparsification sensitivity using a novel metric, the Head Sensitivity Score (HeSS), which approximates the Hessian with respect to two distinct error terms on a small calibration set. In the inference stage, we perform HeSS-Guided Sparsification, leveraging the pre-computed HeSS to reallocate the total attention budget-assigning denser attention to sensitive heads and sparser attention to more robust ones. We demonstrate that HeSS effectively captures head-wise sparsification sensitivity and empirically confirm that attention heads in the global attention layers exhibit heterogeneous sensitivity characteristics. Extensive experiments further show that our method effectively mitigates performance degradation under high sparsity, demonstrating strong robustness across varying sparsification levels. Code is available at https://github.com/libary753/HeSS.

CVFeb 3
A generalizable large-scale foundation model for musculoskeletal radiographs

Shinn Kim, Soobin Lee, Kyoungseob Shin et al.

Artificial intelligence (AI) has shown promise in detecting and characterizing musculoskeletal diseases from radiographs. However, most existing models remain task-specific, annotation-dependent, and limited in generalizability across diseases and anatomical regions. Although a generalizable foundation model trained on large-scale musculoskeletal radiographs is clinically needed, publicly available datasets remain limited in size and lack sufficient diversity to enable training across a wide range of musculoskeletal conditions and anatomical sites. Here, we present SKELEX, a large-scale foundation model for musculoskeletal radiographs, trained using self-supervised learning on 1.2 million diverse, condition-rich images. The model was evaluated on 12 downstream diagnostic tasks and generally outperformed baselines in fracture detection, osteoarthritis grading, and bone tumor classification. Furthermore, SKELEX demonstrated zero-shot abnormality localization, producing error maps that identified pathologic regions without task-specific training. Building on this capability, we developed an interpretable, region-guided model for predicting bone tumors, which maintained robust performance on independent external datasets and was deployed as a publicly accessible web application. Overall, SKELEX provides a scalable, label-efficient, and generalizable AI framework for musculoskeletal imaging, establishing a foundation for both clinical translation and data-efficient research in musculoskeletal radiology.

CVDec 19, 2024
Improving Geometry in Sparse-View 3DGS via Reprojection-based DoF Separation

Yongsung Kim, Minjun Park, Jooyoung Choi et al.

Recent learning-based Multi-View Stereo models have demonstrated state-of-the-art performance in sparse-view 3D reconstruction. However, directly applying 3D Gaussian Splatting (3DGS) as a refinement step following these models presents challenges. We hypothesize that the excessive positional degrees of freedom (DoFs) in Gaussians induce geometry distortion, fitting color patterns at the cost of structural fidelity. To address this, we propose reprojection-based DoF separation, a method distinguishing positional DoFs in terms of uncertainty: image-plane-parallel DoFs and ray-aligned DoF. To independently manage each DoF, we introduce a reprojection process along with tailored constraints for each DoF. Through experiments across various datasets, we confirm that separating the positional DoFs of Gaussians and applying targeted constraints effectively suppresses geometric artifacts, producing reconstruction results that are both visually and geometrically plausible.

CLMay 18, 2025
Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey

Md Mehrab Tanjim, Yeonjun In, Xiang Chen et al.

Ambiguity remains a fundamental challenge in Natural Language Processing (NLP) due to the inherent complexity and flexibility of human language. With the advent of Large Language Models (LLMs), addressing ambiguity has become even more critical due to their expanded capabilities and applications. In the context of Conversational Question Answering (CQA), this paper explores the definition, forms, and implications of ambiguity for language driven systems, particularly in the context of LLMs. We define key terms and concepts, categorize various disambiguation approaches enabled by LLMs, and provide a comparative analysis of their advantages and disadvantages. We also explore publicly available datasets for benchmarking ambiguity detection and resolution techniques and highlight their relevance for ongoing research. Finally, we identify open problems and future research directions, especially in agentic settings, proposing areas for further investigation. By offering a comprehensive review of current research on ambiguities and disambiguation with LLMs, we aim to contribute to the development of more robust and reliable LLM-based systems.

HCAug 1, 2018
Studying Preferences and Concerns about Information Disclosure in Email Notifications

Yongsung Kim, Adam Fourney, Ece Kamar

The proliferation of network-connected devices and applications has resulted in people receiving dozens, or hundreds, of notifications per day. When people are in the presence of others, each notification poses some risk of accidental information disclosure; onlookers may see notifications appear above the lock screen of a mobile phone, on the periphery of a desktop or laptop display, or projected onscreen during a presentation. In this paper, we quantify the prevalence of these accidental disclosures in the context of email notifications, and we study people's relevant preferences and concerns. Our results are compiled from an exploratory retrospective survey of 131 respondents, and a separate contextual-labeling study in which 169 participants labeled 1,040 meeting-email pairs. We find that, for 53% of people, at least 1 in 10 email notifications poses an information disclosure risk. We also find that the real or perceived severity of these risks depend both on user characteristics and attributes of the meeting or email (e.g. the number of recipients or attendees). We conclude by exploring machine learning algorithms to predict people's comfort levels given an email notification and a context, then we present implications for the design of future contextually-relevant notification systems.