Seunghwan Kim

CV
h-index3
7papers
5citations
Novelty47%
AI Score43

7 Papers

LGSep 14, 2024Code
Beta-Sigma VAE: Separating beta and decoder variance in Gaussian variational autoencoder

Seunghwan Kim, Seungkyu Lee

Variational autoencoder (VAE) is an established generative model but is notorious for its blurriness. In this work, we investigate the blurry output problem of VAE and resolve it, exploiting the variance of Gaussian decoder and $β$ of beta-VAE. Specifically, we reveal that the indistinguishability of decoder variance and $β$ hinders appropriate analysis of the model by random likelihood value, and limits performance improvement by omitting the gain from $β$. To address the problem, we propose Beta-Sigma VAE (BS-VAE) that explicitly separates $β$ and decoder variance $σ^2_x$ in the model. Our method demonstrates not only superior performance in natural image synthesis but also controllable parameters and predictable analysis compared to conventional VAE. In our experimental evaluation, we employ the analysis of rate-distortion curve and proxy metrics on computer vision datasets. The code is available on https://github.com/overnap/BS-VAE

CVApr 17
Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling

Sanghyeok Chu, Pyunghwan Ahn, Gwangmo Song et al.

Sparse Upcycling provides an efficient way to initialize a Mixture-of-Experts (MoE) model from pretrained dense weights instead of training from scratch. However, since all experts start from identical weights and the router is randomly initialized, the model suffers from expert symmetry and limited early specialization. We propose Cluster-aware Upcycling, a strategy that incorporates semantic structure into MoE initialization. Our method first partitions the dense model's input activations into semantic clusters. Each expert is then initialized using the subspace representations of its corresponding cluster via truncated SVD, while setting the router's initial weights to the cluster centroids. This cluster-aware initialization breaks expert symmetry and encourages early specialization aligned with the data distribution. Furthermore, we introduce an expert-ensemble self-distillation loss that stabilizes training by providing reliable routing guidance using an ensemble teacher. When evaluated on CLIP ViT-B/32 and ViT-B/16, Cluster-aware Upcycling consistently outperforms existing methods across both zero-shot and few-shot benchmarks. The proposed method also produces more diverse and disentangled expert representations, reduces inter-expert similarity, and leads to more confident routing behavior. Project page: https://sanghyeokchu.github.io/cluster-aware-upcycling/

CVAug 19, 2024
Investigating the diversity and stylization of contemporary user generated visual arts in the complexity entropy plane

Seunghwan Kim, Byunghwee Lee, Wonjae Lee

The advent of computational and numerical methods in recent times has provided new avenues for analyzing art historiographical narratives and tracing the evolution of art styles therein. Here, we investigate an evolutionary process underpinning the emergence and stylization of contemporary user-generated visual art styles using the complexity-entropy (C-H) plane, which quantifies local structures in paintings. Informatizing 149,780 images curated in DeviantArt and Behance platforms from 2010 to 2020, we analyze the relationship between local information of the C-H space and multi-level image features generated by a deep neural network and a feature extraction algorithm. The results reveal significant statistical relationships between the C-H information of visual artistic styles and the dissimilarities of the multi-level image features over time within groups of artworks. By disclosing a particular C-H region where the diversity of image representations is noticeably manifested, our analyses reveal an empirical condition of emerging styles that are both novel in the C-H plane and characterized by greater stylistic diversity. Our research shows that visual art analyses combined with physics-inspired methodologies and machine learning, can provide macroscopic insights into quantitatively mapping relevant characteristics of an evolutionary process underpinning the creative stylization of uncharted visual arts of given groups and time.

CLNov 10, 2023
Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs

Benjamin C. Warner, Thomas Kannampallil, Seunghwan Kim

EHR audit logs are a highly granular stream of events that capture clinician activities, and is a significant area of interest for research in characterizing clinician workflow on the electronic health record (EHR). Existing techniques to measure the complexity of workflow through EHR audit logs (audit logs) involve time- or frequency-based cross-sectional aggregations that are unable to capture the full complexity of a EHR session. We briefly evaluate the usage of transformer-based tabular language model (tabular LM) in measuring the entropy or disorderedness of action sequences within workflow and release the evaluated models publicly.

ROJan 20
Communication-Free Collective Navigation for a Swarm of UAVs via LiDAR-Based Deep Reinforcement Learning

Myong-Yol Choi, Hankyoul Ko, Hanse Cho et al.

This paper presents a deep reinforcement learning (DRL) based controller for collective navigation of unmanned aerial vehicle (UAV) swarms in communication-denied environments, enabling robust operation in complex, obstacle-rich environments. Inspired by biological swarms where informed individuals guide groups without explicit communication, we employ an implicit leader-follower framework. In this paradigm, only the leader possesses goal information, while follower UAVs learn robust policies using only onboard LiDAR sensing, without requiring any inter-agent communication or leader identification. Our system utilizes LiDAR point clustering and an extended Kalman filter for stable neighbor tracking, providing reliable perception independent of external positioning systems. The core of our approach is a DRL controller, trained in GPU-accelerated Nvidia Isaac Sim, that enables followers to learn complex emergent behaviors - balancing flocking and obstacle avoidance - using only local perception. This allows the swarm to implicitly follow the leader while robustly addressing perceptual challenges such as occlusion and limited field-of-view. The robustness and sim-to-real transfer of our approach are confirmed through extensive simulations and challenging real-world experiments with a swarm of five UAVs, which successfully demonstrated collective navigation across diverse indoor and outdoor environments without any communication or external localization.

AIMar 10
From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring

Seunghwan Kim, Tiffany H. Kung, Heena Verma et al.

Background: Remote patient monitoring (RPM) generates vast data, yet landmark trials (Tele-HF, BEAT-HF) failed because data volume overwhelmed clinical staff. While TIM-HF2 showed 24/7 physician-led monitoring reduces mortality by 30%, this model remains prohibitively expensive and unscalable. Methods: We developed Sentinel, an autonomous AI agent using Model Context Protocol (MCP) for contextual triage of RPM vitals via 21 clinical tools and multi-step reasoning. Evaluation included: (1) self-consistency (100 readings x 5 runs); (2) comparison against rule-based thresholds; and (3) validation against 6 clinicians (3 physicians, 3 NPs) using a connected matrix design. A leave-one-out (LOO) analysis compared the agent against individual clinicians; severe overtriage cases underwent independent physician adjudication. Results: Against a human majority-vote standard (N=467), the agent achieved 95.8% emergency sensitivity and 88.5% sensitivity for all actionable alerts (85.7% specificity). Four-level exact accuracy was 69.4% (quadratic-weighted kappa=0.778); 95.9% of classifications were within one severity level. In LOO analysis, the agent outperformed every clinician in emergency sensitivity (97.5% vs. 60.0% aggregate) and actionable sensitivity (90.9% vs. 69.5%). While disagreements skewed toward overtriage (22.5%), independent adjudication of severe gaps (>=2 levels) validated agent escalation in 88-94% of cases; consensus resolution validated 100%. The agent showed near-perfect self-consistency (kappa=0.850). Median cost was $0.34/triage. Conclusions: Sentinel triages RPM vitals with sensitivity exceeding individual clinicians. By automating systematic context synthesis, Sentinel addresses the core limitation of prior RPM trials, offering a scalable path toward the intensive monitoring shown to reduce mortality while maintaining a clinically defensible overtriage profile.

CVFeb 7, 2025
Neural Clustering for Prefractured Mesh Generation in Real-time Object Destruction

Seunghwan Kim, Sunha Park, Seungkyu Lee

Prefracture method is a practical implementation for real-time object destruction that is hardly achievable within performance constraints, but can produce unrealistic results due to its heuristic nature. To mitigate it, we approach the clustering of prefractured mesh generation as an unordered segmentation on point cloud data, and propose leveraging the deep neural network trained on a physics-based dataset. Our novel paradigm successfully predicts the structural weakness of object that have been limited, exhibiting ready-to-use results with remarkable quality.