LGApr 7, 2022Code
Explicit Feature Interaction-aware Graph Neural NetworksMinkyu Kim, Hyun-Soo Choi, Jinho Kim
Graph neural networks (GNNs) are powerful tools for handling graph-structured data. However, their design often limits them to learning only higher-order feature interactions, leaving low-order feature interactions overlooked. To address this problem, we introduce a novel GNN method called explicit feature interaction-aware graph neural network (EFI-GNN). Unlike conventional GNNs, EFI-GNN is a multilayer linear network designed to model arbitrary-order feature interactions explicitly within graphs. To validate the efficacy of EFI-GNN, we conduct experiments using various datasets. The experimental results demonstrate that EFI-GNN has competitive performance with existing GNNs, and when a GNN is jointly trained with EFI-GNN, predictive performance sees an improvement. Furthermore, the predictions made by EFI-GNN are interpretable, owing to its linear construction. The source code of EFI-GNN is available at https://github.com/gim4855744/EFI-GNN
LGSep 30, 2022Code
Higher-order Neural Additive Models: An Interpretable Machine Learning Model with Feature InteractionsMinkyu Kim, Hyun-Soo Choi, Jinho Kim
Neural Additive Models (NAMs) have recently demonstrated promising predictive performance while maintaining interpretability. However, their capacity is limited to capturing only first-order feature interactions, which restricts their effectiveness on real-world datasets. To address this limitation, we propose Higher-order Neural Additive Models (HONAMs), an interpretable machine learning model that effectively and efficiently captures feature interactions of arbitrary orders. HONAMs improve predictive accuracy without compromising interpretability, an essential requirement in high-stakes applications. This advantage of HONAM can help analyze and extract high-order interactions present in datasets. The source code for HONAM is publicly available at https://github.com/gim4855744/HONAM/.
LGNov 1, 2023
StableFDG: Style and Attention Based Learning for Federated Domain GeneralizationJungwuk Park, Dong-Jun Han, Jinho Kim et al.
Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of samples/domains in each client's local dataset. In this paper, we propose StableFDG, a style and attention based learning strategy for accomplishing federated domain generalization, introducing two key contributions. The first is style-based learning, which enables each client to explore novel styles beyond the original source domains in its local dataset, improving domain diversity based on the proposed style sharing, shifting, and exploration strategies. Our second contribution is an attention-based feature highlighter, which captures the similarities between the features of data samples in the same class, and emphasizes the important/common characteristics to better learn the domain-invariant characteristics of each class in data-poor FL scenarios. Experimental results show that StableFDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy.
CYMay 6
Guidelines for Designing AI Technologies to Support Adult LearningJennifer M. Reddig, Glen R. Smith, Sanaz Ahmadzadeh Siyahrood et al.
AI-powered educational technologies have demonstrated measurable benefits for learners, but their design and evaluation have largely centered on K-12 contexts. As a result, many AI-supported learning systems remain poorly aligned with the needs, constraints, and goals of adult learners. To better understand how AI systems function in adult education, this paper examines the deployment of several AI learning technologies developed within a multidisciplinary, national research institute in the United States focused on adult learning and online education. Drawing on longitudinal deployment data, we conducted a reflexive thematic analysis to identify recurring challenges and design considerations across systems. These insights were synthesized into a set of 19 design guidelines intended to inform future AI-supported adult learning technologies. We demonstrate the utility of these guidelines through a heuristic evaluation of the deployed systems. Lastly, we present a guideline exploration tool that aids in the ideation of technologies by connecting the guidelines to stakeholder statements surfaced in the analysis process.
LGOct 14, 2024Code
Transparent Networks for Multivariate Time SeriesMinkyu Kim, Suan Lee, Jinho Kim
Transparent models, which are machine learning models that produce inherently interpretable predictions, are receiving significant attention in high-stakes domains. However, despite much real-world data being collected as time series, there is a lack of studies on transparent time series models. To address this gap, we propose a novel transparent neural network model for time series called Generalized Additive Time Series Model (GATSM). GATSM consists of two parts: 1) independent feature networks to learn feature representations, and 2) a transparent temporal module to learn temporal patterns across different time steps using the feature representations. This structure allows GATSM to effectively capture temporal patterns and handle dynamic-length time series while preserving transparency. Empirical experiments show that GATSM significantly outperforms existing generalized additive models and achieves comparable performance to black-box time series models, such as recurrent neural networks and Transformer. In addition, we demonstrate that GATSM finds interesting patterns in time series. The source code is available at https://github.com/gim4855744/GATSM.
CLNov 19, 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language ModelDongyoung Go, Taesun Whang, Chanhee Lee et al.
The integration of Retrieval-Augmented Generation (RAG) with Multimodal Large Language Models (MLLMs) has revolutionized information retrieval and expanded the practical applications of AI. However, current systems struggle in accurately interpreting user intent, employing diverse retrieval strategies, and effectively filtering unintended or inappropriate responses, limiting their effectiveness. This paper introduces Contextual Understanding and Enhanced Search with MLLM (CUE-M), a novel multimodal search framework that addresses these challenges through a multi-stage pipeline comprising image context enrichment, intent refinement, contextual query generation, external API integration, and relevance-based filtering. CUE-M incorporates a robust filtering pipeline combining image-based, text-based, and multimodal classifiers, dynamically adapting to instance- and category-specific concern defined by organizational policies. Extensive experiments on real-word datasets and public benchmarks on knowledge-based VQA and safety demonstrated that CUE-M outperforms baselines and establishes new state-of-the-art results, advancing the capabilities of multimodal retrieval systems.
LGOct 6, 2025
Post-training quantization of vision encoders needs prefixing registersSeunghyeon Kim, Jinho Kim, Taesun Yeom et al.
Transformer-based vision encoders -- such as CLIP -- are central to multimodal intelligence, powering applications from autonomous web agents to robotic control. Since these applications often demand real-time processing of massive visual data, reducing the inference cost of vision encoders is critical. Post-training quantization offers a practical path, but remains challenging even at 8-bit precision due to massive-scale activations (i.e., outliers). In this work, we propose $\textit{RegCache}$, a training-free algorithm to mitigate outliers in vision encoders, enabling quantization with significantly smaller accuracy drops. The proposed RegCache introduces outlier-prone yet semantically meaningless prefix tokens to the target vision encoder, which prevents other tokens from having outliers. Notably, we observe that outliers in vision encoders behave differently from those in language models, motivating two technical innovations: middle-layer prefixing and token deletion. Experiments show that our method consistently improves the accuracy of quantized models across both text-supervised and self-supervised vision encoders.
IVAug 8, 2025
Zero-shot self-supervised learning of single breath-hold magnetic resonance cholangiopancreatography (MRCP) reconstructionJinho Kim, Marcel Dominik Nickel, Florian Knoll
Purpose: To investigate the feasibility of applying zero-shot self-supervised learning reconstruction to reduce breath-hold times in magnetic resonance cholangiopancreatography (MRCP). Methods: Breath-hold MRCP was acquired from 11 healthy volunteers on a 3T scanner using an incoherent k-space sampling pattern leading to a breath-hold duration of 14s. We evaluated zero-shot reconstruction of breath-hold MRCP against parallel imaging of respiratory-triggered MRCP acquired in 338s on average and compressed sensing reconstruction of breath-hold MRCP. To address the long computation times of zero-shot trainings, we used a training approach that leverages a pretrained network to reduce backpropagation depth during training. Results: Zero-shot learning reconstruction significantly improved visual image quality compared to compressed sensing reconstruction, particularly in terms of signal-to-noise ratio and ductal delineation, and reached a level of quality comparable to that of successful respiratory-triggered acquisitions with regular breathing patterns. Shallow training provided nearly equivalent reconstruction performance with a training time of 11 minutes in comparison to 271 minutes for a conventional zero-shot training. Conclusion: Zero-shot learning delivers high-fidelity MRCP reconstructions with reduced breath-hold times, and shallow training offers a practical solution for translation to time-constrained clinical workflows.
IVMay 6, 2024
Deep Learning-based Accelerated MR Cholangiopancreatography without Fully-sampled DataJinho Kim, Marcel Dominik Nickel, Florian Knoll
The purpose of this study was to accelerate MR cholangiopancreatography (MRCP) acquisitions using deep learning-based (DL) reconstruction at 3T and 0.55T. A total of 35 healthy volunteers underwent conventional two-fold accelerated MRCP scans at field strengths of 3T and 0.55T. We trained DL reconstructions using two different training strategies, supervised (SV) and self-supervised (SSV), with retrospectively six-fold undersampled data obtained at 3T. We then evaluated the DL reconstructions against standard techniques, parallel imaging (PI) and compressed sensing (CS), focusing on peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as metrics. We also tested DL reconstructions with prospectively accelerated acquisitions and evaluated their robustness when changing fields strengths from 3T to 0.55T. DL reconstructions demonstrated a reduction in average acquisition time from 599/542 to 255/180 seconds for MRCP at 3T/0.55T. In both retrospective and prospective undersampling, PSNR and SSIM of DL reconstructions were higher than those of PI and CS. At the same time, DL reconstructions preserved the image quality of undersampled data, including sharpness and the visibility of hepatobiliary ducts. In addition, both DL approaches produced high-quality reconstructions at 0.55T. In summary, DL reconstructions trained for highly accelerated MRCP enabled a reduction in acquisition time by a factor of 2.4/3.0 at 3T/0.55T while maintaining the image quality of conventional acquisitions.
ROSep 20, 2020
Comparing Feedback Linearization and Adaptive Backstepping Control for Airborne Orientation of Agile Ground Robots using Wheel Reaction TorqueJinho Kim, Daniel J. Gonzalez, Christopher M. Korpela
In this paper, two nonlinear methods for stabilizing the orientation of a Four-Wheel Independent Drive and Steering (4WIDS) robot while in the air are analyzed, implemented in simulation, and compared. AGRO (the Agile Ground Robot) is a 4WIDS inspection robot that can be deployed into unsafe environments by being thrown, and can use the reaction torque from its four wheels to command its orientation while in the air. Prior work has demonstrated on a hardware prototype that simple PD control with hand-tuned gains is sufficient, but hardly optimal, to stabilize the orientation in under 500ms. The goal of this work is to decrease the stabilization time and reject disturbances using nonlinear control methods. A model-based Feedback Linearization (FL) was added to compensate for the nonlinear Coriolis terms. However, with external disturbances, model uncertainty and sensor noise, the FL controller does not guarantee stability. As an alternative, a second controller was developed using backstepping methods with an adaptive compensator for external disturbances, model uncertainty, and sensor offset. The controller was designed using Lyapunov analysis. A simulation was written using the full nonlinear dynamics of AGRO in an isotropic steering configuration in which control authority over its pitch and roll are equalized. The PD+FL control method was compared to the backstepping control method using the same initial conditions in simulation. Both the backstepping controller and the PD+FL controller stabilized the system within 250 milliseconds. The adaptive backstepping controller was also able to achieve this performance with the adaptation law enabled and compensating for offset noisy sinusoidal disturbances.