Jihan Kim

MTRL-SCI
h-index26
10papers
46citations
Novelty50%
AI Score48

10 Papers

SYJan 11, 2018
A Zero-stealthy Attack for Sampled-data Control Systems via Input Redundancy

Jihan Kim, Gyunghoon Park, Hyungbo Shim et al.

In this paper, we introduce a new vulnerability of cyber-physical systems to malicious attack. It arises when the physical plant, that is modeled as a continuous-time LTI system, is controlled by a digital controller. In the sampled-data framework, most anomaly detectors monitor the plant's output only at discrete time instants, and thus, nothing abnormal can be detected as long as the sampled output behaves normal. This implies that if an actuator attack drives the plant's state to pass through the kernel of the output matrix at each sensing time, then the attack compromises the system while remaining stealthy. We show that this type of attack always exists when the sampled-data system has an input redundancy, i.e., the number of inputs being larger than that of the outputs or the sampling rate of the actuators being higher than that of the sensors. Simulation results for the X-38 vehicle and for the other numerical examples illustrate this new attack strategy possibly brings disastrous consequences.

CLAug 1, 2023
ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks

Yeonghun Kang, Jihan Kim

ChatMOF is an autonomous Artificial Intelligence (AI) system that is built to predict and generate metal-organic frameworks (MOFs). By leveraging a large-scale language model (GPT-4 and GPT-3.5-turbo), ChatMOF extracts key details from textual inputs and delivers appropriate responses, thus eliminating the necessity for rigid structured queries. The system is comprised of three core components (i.e. an agent, a toolkit, and an evaluator) and it forms a robust pipeline that manages a variety of tasks, including data retrieval, property prediction, and structure generations. The study further explores the merits and constraints of using large language models (LLMs) AI system in material sciences using and showcases its transformative potential for future advancements.

MTRL-SCINov 5, 2025
EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture

Seunghee Han, Yeonghun Kang, Taeun Bae et al.

Designing materials with targeted properties remains challenging due to the vastness of chemical space and the scarcity of property-labeled data. While recent advances in generative models offer a promising way for inverse design, most approaches require large datasets and must be retrained for every new target property. Here, we introduce the EGMOF (Efficient Generation of MOFs), a hybrid diffusion-transformer framework that overcomes these limitations through a modular, descriptor-mediated workflow. EGMOF decomposes inverse design into two steps: (1) a one-dimensional diffusion model (Prop2Desc) that maps desired properties to chemically meaningful descriptors followed by (2) a transformer model (Desc2MOF) that generates structures from these descriptors. This modular hybrid design enables minimal retraining and maintains high accuracy even under small-data conditions. On a hydrogen uptake dataset, EGMOF achieved over 95% validity and 84% hit rate, representing significant improvements of up to 57% in validity and 14% in hit rate compared to existing methods, while remaining effective with only 1,000 training samples. Moreover, our model successfully performed conditional generation across 29 diverse property datasets, including CoREMOF, QMOF, and text-mined experimental datasets, whereas previous models have not. This work presents a data-efficient, generalizable approach to the inverse design of diverse MOFs and highlights the potential of modular inverse design workflows for broader materials discovery.

CVSep 19, 2023
MatGD: Materials Graph Digitizer

Jaewoong Lee, Wonseok Lee, Jihan Kim

We have developed MatGD (Material Graph Digitizer), which is a tool for digitizing a data line from scientific graphs. The algorithm behind the tool consists of four steps: (1) identifying graphs within subfigures, (2) separating axes and data sections, (3) discerning the data lines by eliminating irrelevant graph objects and matching with the legend, and (4) data extraction and saving. From the 62,534 papers in the areas of batteries, catalysis, and MOFs, 501,045 figures were mined. Remarkably, our tool showcased performance with over 99% accuracy in legend marker and text detection. Moreover, its capability for data line separation stood at 66%, which is much higher compared to other existing figure mining tools. We believe that this tool will be integral to collecting both past and future data from publications, and these data can be used to train various machine learning models that can enhance material predictions and new materials discovery.

28.2DBApr 3
LitMOF: An LLM Multi-Agent for Literature-Validated Metal-Organic Frameworks Database Correction and Expansion

Honghui Kim, Dohoon Kim, Jihan Kim

Metal-organic framework (MOF) databases have grown rapidly through experimental deposition and large-scale literature extraction, but recent analyses show that nearly half of their entries contain substantial structural errors. These inaccuracies propagate through high-throughput screening and machine-learning workflows, limiting the reliability of data-driven MOF discovery. Correcting such errors is exceptionally difficult because true repairs require integrating crystallographic files, synthesis descriptions, and contextual evidence scattered across the literature. Here we introduce LitMOF, a large language model-driven multi-agent framework that validates crystallographic information directly from the original literature and cross-validates it with database entries to repair structural errors. Applying LitMOF to the experimental MOF database (the CSD MOF Subset), we constructed LitMOF-DB, a curated set of 186,773 computation-ready structures, including the successful repair of 8,771 invalid entries, which accounts for 65.3% of the not-computation-ready MOFs in the latest CoRE MOF database. Additionally, the system uncovered 12,646 experimentally reported MOFs absent from existing resources, substantially expanding the known experimental design space. Using direct air capture screening as a case study, we demonstrate that structural errors severely distort predicted adsorption energies and CO2/H2O selectivity, leading to systematic misranking of materials, false positives, and the omission of high-performance candidates. This work establishes a scalable pathway toward self-correcting scientific databases and a generalizable paradigm for LLM-driven curation in materials science.

35.1AIMar 31
SimMOF: AI agent for Automated MOF Simulations

Jaewoong Lee, Taeun Bae, Jihan Kim

Metal-organic frameworks (MOFs) offer a vast design space, and as such, computational simulations play a critical role in predicting their structural and physicochemical properties. However, MOF simulations remain difficult to access because reliable analysis require expert decisions for workflow construction, parameter selection, tool interoperability, and the preparation of computational ready structures. Here, we introduce SimMOF, a large language model based multi agent framework that automates end-to-end MOF simulation workflows from natural language queries. SimMOF translates user requests into dependency aware plans, generates runnable inputs, orchestrates multiple agents to execute simulations, and summarizes results with analysis aligned to the user query. Through representative case studies, we show that SimMOF enables adaptive and cognitively autonomous workflows that reflect the iterative and decision driven behavior of human researchers and as such provides a scalable foundation for data driven MOF research.

24.3MTRL-SCIApr 21
Multimodal Transformer for Sample-Aware Prediction of Metal-Organic Framework Properties

Seunghee Han, Jaewoong Lee, Jihan Kim

Metal-organic frameworks (MOFs) are a major target of machine-learning-based property prediction, yet most models assume that a single framework representation maps to a single property value. This assumption becomes problematic for experimental MOFs, where samples reported as the same framework can exhibit different properties because of differences in crystallinity, phase purity, defects, and other sample-dependent factors. Here we introduce Experimental X-ray Diffraction Integrated Transformer (EXIT), a multimodal transformer for sample-aware prediction of MOF properties that combines MOFid with X-ray diffraction (XRD). In EXIT, MOFid encodes MOF identity, whereas XRD provides complementary information about the experimentally realized sample state. EXIT is pre-trained on one million hypothetical MOFs with simulated XRD to learn transferable representations, leading to improved downstream performance relative to existing approaches. EXIT is fine-tuned on literature-derived experimental datasets for surface area and pore volume prediction. Incorporating experimental XRD improves predictive performance relative to models without experimental XRD, and attention analysis and sample-level case studies further show that EXIT assigns different predictions to samples sharing the same MOF identity when their XRD patterns differ. These results establish a practical step from framework-aware to sample-aware MOF property prediction and highlight the value of incorporating experimental characterization into porous materials informatics.

LGNov 26, 2024
Data-driven development of cycle prediction models for lithium metal batteries using multi modal mining

Jaewoong Lee, Junhee Woo, Sejin Kim et al.

Recent advances in data-driven research have shown great potential in understanding the intricate relationships between materials and their performances. Herein, we introduce a novel multi modal data-driven approach employing an Automatic Battery data Collector (ABC) that integrates a large language model (LLM) with an automatic graph mining tool, Material Graph Digitizer (MatGD). This platform enables state-of-the-art accurate extraction of battery material data and cyclability performance metrics from diverse textual and graphical data sources. From the database derived through the ABC platform, we developed machine learning models that can accurately predict the capacity and stability of lithium metal batteries, which is the first-ever model developed to achieve such predictions. Our models were also experimentally validated, confirming practical applicability and reliability of our data-driven approach.

MTRL-SCIJun 18, 2024
Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks

Seunghee Han, Byeong Gwan Lee, Dae Woon Lim et al.

Recently, metal-organic frameworks (MOFs) have demonstrated their potential as solid-state electrolytes in proton exchange membrane fuel cells. However, the number of MOFs reported to exhibit proton conductivity remains limited, and the mechanisms underlying this phenomenon are not fully elucidated, complicating the design of proton-conductive MOFs. In response, we developed a comprehensive database of proton-conductive MOFs and applied machine learning techniques to predict their proton conductivity. Our approach included the construction of both descriptor-based and transformer-based models. Notably, the transformer-based transfer learning (Freeze) model performed the best with a mean absolute error (MAE) of 0.91, suggesting that the proton conductivity of MOFs can be estimated within one order of magnitude using this model. Additionally, we employed feature importance and principal component analysis to explore the factors influencing proton conductivity. The insights gained from our database and machine learning model are expected to facilitate the targeted design of proton-conductive MOFs.

LGMay 20, 2024
Property-guided Inverse Design of Metal-Organic Frameworks Using Quantum Natural Language Processing

Shinyoung Kang, Jihan Kim

In this study, we explore the potential of using quantum natural language processing (QNLP) to inverse design metal-organic frameworks (MOFs) with targeted properties. Specifically, by analyzing 450 hypothetical MOF structures consisting of 3 topologies, 10 metal nodes and 15 organic ligands, we categorize these structures into four distinct classes for pore volume and $CO_{2}$ Henry's constant values. We then compare various QNLP models (i.e. the bag-of-words, DisCoCat (Distributional Compositional Categorical), and sequence-based models) to identify the most effective approach to process the MOF dataset. Using a classical simulator provided by the IBM Qiskit, the bag-of-words model is identified to be the optimum model, achieving validation accuracies of 88.6% and 78.0% for binary classification tasks on pore volume and $CO_{2}$ Henry's constant, respectively. Further, we developed multi-class classification models tailored to the probabilistic nature of quantum circuits, with average test accuracies of 92% and 80% across different classes for pore volume and $CO_{2}$ Henry's constant datasets. Finally, the performance of generating MOF with target properties showed accuracies of 93.5% for pore volume and 87% for $CO_{2}$ Henry's constant, respectively. Although our investigation covers only a fraction of the vast MOF search space, it marks a promising first step towards using quantum computing for materials design, offering a new perspective through which to explore the complex landscape of MOFs.