Hyungjin Kim

LG
h-index12
9papers
275citations
Novelty49%
AI Score54

9 Papers

CLOct 22, 2023Code
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images

Seowoo Lee, Jiwon Youn, Hyungjin Kim et al.

Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLAVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.62 for seven major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports. Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts. CXR-LLAVA is available at https://github.com/ECOFRI/CXR_LLAVA.

LGNov 10, 2025
Sensor Calibration Model Balancing Accuracy, Real-time, and Efficiency

Jinyong Yun, Hyungjin Kim, Seokho Ahn et al.

Most on-device sensor calibration studies benchmark models only against three macroscopic requirements (i.e., accuracy, real-time, and resource efficiency), thereby hiding deployment bottlenecks such as instantaneous error and worst-case latency. We therefore decompose this triad into eight microscopic requirements and introduce Scare (Sensor Calibration model balancing Accuracy, Real-time, and Efficiency), an ultra-compressed transformer that fulfills them all. SCARE comprises three core components: (1) Sequence Lens Projector (SLP) that logarithmically compresses time-series data while preserving boundary information across bins, (2) Efficient Bitwise Attention (EBA) module that replaces costly multiplications with bitwise operations via binary hash codes, and (3) Hash optimization strategy that ensures stable training without auxiliary loss terms. Together, these components minimize computational overhead while maintaining high accuracy and compatibility with microcontroller units (MCUs). Extensive experiments on large-scale air-quality datasets and real microcontroller deployments demonstrate that Scare outperforms existing linear, hybrid, and deep-learning baselines, making Scare, to the best of our knowledge, the first model to meet all eight microscopic requirements simultaneously.

CVAug 5, 2025Code
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models

Hyungjin Kim, Seokho Ahn, Young-Duk Seo

Personalized generation in T2I diffusion models aims to naturally incorporate individual user preferences into the generation process with minimal user intervention. However, existing studies primarily rely on prompt-level modeling with large-scale models, often leading to inaccurate personalization due to the limited input token capacity of T2I diffusion models. To address these limitations, we propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enable personalized generation through condition-level modeling in the latent space. DrUM demonstrates strong performance on large-scale datasets and seamlessly integrates with open-source text encoders, making it compatible with widely used foundation T2I models without requiring additional fine-tuning.

LGNov 4, 2025
Stochastic Deep Graph Clustering for Practical Group Formation

Junhyung Park, Hyungjin Kim, Seokho Ahn et al.

While prior work on group recommender systems (GRSs) has primarily focused on improving recommendation accuracy, most approaches assume static or predefined groups, making them unsuitable for dynamic, real-world scenarios. We reframe group formation as a core challenge in GRSs and propose DeepForm (Stochastic Deep Graph Clustering for Practical Group Formation), a framework designed to meet three key operational requirements: (1) the incorporation of high-order user information, (2) real-time group formation, and (3) dynamic adjustment of the number of groups. DeepForm employs a lightweight GCN architecture that effectively captures high-order structural signals. Stochastic cluster learning enables adaptive group reconfiguration without retraining, while contrastive learning refines groups under dynamic conditions. Experiments on multiple datasets demonstrate that DeepForm achieves superior group formation quality, efficiency, and recommendation accuracy compared with various baselines.

LGApr 29
Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning

Seungyub Han, Hyungjin Kim, Jungwoo Lee

Offline reinforcement learning (RL) agents often fail when deployed, as the gap between training datasets and real environments leads to unsafe behavior. To address this, we present SAS (Self-Alignment for Safety), a transformer-based framework that enables test-time adaptation in offline safe RL without retraining. In SAS, the main mechanism is self-alignment: at test time, the pretrained agent generates several imagined trajectories and selects those satisfying the Lyapunov condition. These feasible segments are then recycled as in-context prompts, allowing the agent to realign its behavior toward safety while avoiding parameter updates. In effect, SAS turns Lyapunov-guided imagination into control-invariant prompts, and its transformer architecture admits a hierarchical RL interpretation where prompting functions as Bayesian inference over latent skills. Across Safety Gymnasium and MuJoCo benchmarks, SAS consistently reduces cost and failure while maintaining or improving return.

LGDec 28, 2024
Real-time Calibration Model for Low-cost Sensor in Fine-grained Time series

Seokho Ahn, Hyungjin Kim, Sungbok Shin et al.

Precise measurements from sensors are crucial, but data is usually collected from low-cost, low-tech systems, which are often inaccurate. Thus, they require further calibrations. To that end, we first identify three requirements for effective calibration under practical low-tech sensor conditions. Based on the requirements, we develop a model called TESLA, Transformer for effective sensor calibration utilizing logarithmic-binned attention. TESLA uses a high-performance deep learning model, Transformers, to calibrate and capture non-linear components. At its core, it employs logarithmic binning to minimize attention complexity. TESLA achieves consistent real-time calibration, even with longer sequences and finer-grained time series in hardware-constrained systems. Experiments show that TESLA outperforms existing novel deep learning and newly crafted linear models in accuracy, calibration speed, and energy efficiency.

LGFeb 12, 2025
SenDaL: An Effective and Efficient Calibration Framework of Low-Cost Sensors for Daily Life

Seokho Ahn, Hyungjin Kim, Euijong Lee et al.

The collection of accurate and noise-free data is a crucial part of Internet of Things (IoT)-controlled environments. However, the data collected from various sensors in daily life often suffer from inaccuracies. Additionally, IoT-controlled devices with low-cost sensors lack sufficient hardware resources to employ conventional deep-learning models. To overcome this limitation, we propose sensors for daily life (SenDaL), the first framework that utilizes neural networks for calibrating low cost sensors. SenDaL introduces novel training and inference processes that enable it to achieve accuracy comparable to deep learning models while simultaneously preserving latency and energy consumption similar to linear models. SenDaL is first trained in a bottom-up manner, making decisions based on calibration results from both linear and deep learning models. Once both models are trained, SenDaL makes independent decisions through a top-down inference process, ensuring accuracy and inference speed. Furthermore, SenDaL can select the optimal deep learning model according to the resources of the IoT devices because it is compatible with various deep learning models, such as long short-term memory-based and Transformer-based models. We have verified that SenDaL outperforms existing deep learning models in terms of accuracy, latency, and energy efficiency through experiments conducted in different IoT environments and real-life scenarios.

LGFeb 16, 2020
REST: Performance Improvement of a Black Box Model via RL-based Spatial Transformation

Jae Myung Kim, Hyungjin Kim, Chanwoo Park et al.

In recent years, deep neural networks (DNN) have become a highly active area of research, and shown remarkable achievements on a variety of computer vision tasks. DNNs, however, are known to often make overconfident yet incorrect predictions on out-of-distribution samples, which can be a major obstacle to real-world deployments because the training dataset is always limited compared to diverse real-world samples. Thus, it is fundamental to provide guarantees of robustness to the distribution shift between training and test time when we construct DNN models in practice. Moreover, in many cases, the deep learning models are deployed as black boxes and the performance has been already optimized for a training dataset, thus changing the black box itself can lead to performance degradation. We here study the robustness to the geometric transformations in a specific condition where the black-box image classifier is given. We propose an additional learner, \emph{REinforcement Spatial Transform learner (REST)}, that transforms the warped input data into samples regarded as in-distribution by the black-box models. Our work aims to improve the robustness by adding a REST module in front of any black boxes and training only the REST module without retraining the original black box model in an end-to-end manner, i.e. we try to convert the real-world data into training distribution which the performance of the black-box model is best suited for. We use a confidence score that is obtained from the black-box model to determine whether the transformed input is drawn from in-distribution. We empirically show that our method has an advantage in generalization to geometric transformations and sample efficiency.

ETJun 27, 2019
4K-Memristor Analog-Grade Passive Crossbar Circuit

Hyungjin Kim, Hussein Nili, Mahmood Mahmoodi et al.

The superior density of passive analog-grade memristive crossbars may enable storing large synaptic weight matrices directly on specialized neuromorphic chips, thus avoiding costly off-chip communication. To ensure efficient use of such crossbars in neuromorphic computing circuits, variations of current-voltage characteristics of crosspoint devices must be substantially lower than those of memory cells with select transistors. Apparently, this requirement explains why there were so few demonstrations of neuromorphic system prototypes using passive crossbars. Here we report a 64x64 passive metal-oxide memristor crossbar circuit with ~99% device yield, based on a foundry-compatible fabrication process featuring etch-down patterning and low-temperature budget, conducive to vertical integration. The achieved ~26% variations of switching voltages of our devices were sufficient for programming 4K-pixel gray-scale patterns with an average tuning error smaller than 4%. The analog properties were further verified by experimentally demonstrating MNIST pattern classification with a fidelity close to the software-modeled limit for a network of this size, with an ~1% average error of import of ex-situ-calculated synaptic weights. We believe that our work is a significant improvement over the state-of-the-art passive crossbar memories in both complexity and analog properties.