Sangwon Lee

CV
h-index45
13papers
414citations
Novelty50%
AI Score56

13 Papers

ARJan 14, 2023
Failure Tolerant Training with Persistent Memory Disaggregation over CXL

Miryeong Kwon, Junhyeok Jang, Hanjin Choi et al.

This paper proposes TRAININGCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory (PMEM) and GPU into a cache-coherent domain as Type-2. Enabling CXL allows PMEM to be directly placed in GPU's memory hierarchy, such that GPU can access PMEM without software intervention. TRAININGCXL introduces computing and checkpointing logic near the CXL controller, thereby training data and managing persistency in an active manner. Considering PMEM's vulnerability, ii) we utilize the unique characteristics of recommendation models and take the checkpointing overhead off the critical path of their training. Lastly, iii) TRAININGCXL employs an advanced checkpointing technique that relaxes the updating sequence of model parameters and embeddings across training batches. The evaluation shows that TRAININGCXL achieves 5.2x training performance improvement and 76% energy savings, compared to the modern PMEM-based recommendation systems.

88.0SYMay 19
Sensor Attack Detection Method for Encrypted State Observers

Yeongjun Jang, Sangwon Lee, Junsoo Kim

This paper proposes an encrypted state observer that is capable of detecting sensor attacks without decryption. We first design a state observer that operates over a finite field of integers with the modular arithmetic. The observer generates a residue signal that indicates the presence of attacks under sparse attack and sensing redundancy conditions. Then, we develop a homomorphic encryption scheme that enables the observer to operate over encrypted data while automatically disclosing the residue signal. Unlike our previous work restricted to single-input single-output systems, the proposed scheme is applicable to general multi-input multi-output systems. Given that the disclosed residue signal remains below a prescribed threshold, the full state can be recovered as an encrypted message.

17.6CEApr 16
Transfer Learning-Based Surrogate Modeling for Nonlinear Time-History Response Analysis of High-Fidelity Structural Models

Keiichi Ishikawa, Yuma Matsumoto, Taro Yaoyama et al.

In a performance based earthquake engineering (PBEE) framework, nonlinear time-history response analysis (NLTHA) for numerous ground motions are required to assess the seismic risk of buildings or civil engineering structures. However, such numerical simulations are computationally expensive, limiting the real-world practical application of the framework. To address this issue, previous studies have used machine learning to predict the structural responses to ground motions with low computational costs. These studies typically conduct NLTHAs for a few hundreds ground motions and use the results to train and validate surrogate models. However, most of the previous studies focused on computationally-inexpensive response analysis models such as single degree of freedom. Surrogate models of high-fidelity response analysis are required to enrich the quantity and diversity of information used for damage assessment in PBEE. Notably, the computational cost of creating training and validation datasets increases if the fidelity of response analysis model becomes higher. Therefore, methods that enable surrogate modeling of high-fidelity response analysis without a large number of training samples are needed. This study proposes a framework that uses transfer learning to construct the surrogate model of a high-fidelity response analysis model. This framework uses a surrogate model of low-fidelity response analysis as the pretrained model and transfers its knowledge to construct surrogate models for high-fidelity response analysis with substantially reduced computational cost. As a case study, surrogate models that predict responses of a 20-story steel moment frame were constructed with only 20 samples as the training dataset. The responses to the ground motions predicted by constructed surrogate model were consistent with a site-specific time-based hazard.

LGJan 24, 2025
Humanity's Last Exam

Long Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.

CVSep 28, 2025
CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement

Boseong Jeon, Junghyuk Lee, Jimin Park et al.

Recent works on object removal and insertion have enhanced their performance by handling object effects such as shadows and reflections, using diffusion models trained on counterfactual datasets. However, the performance impact of applying classifier-free guidance to handle object effects across removal and insertion tasks within a unified model remains largely unexplored. To address this gap and improve efficiency in composite editing, we propose CrimEdit, which jointly trains the task embeddings for removal and insertion within a single model and leverages them in a classifier-free guidance scheme -- enhancing the removal of both objects and their effects, and enabling controllable synthesis of object effects during insertion. CrimEdit also extends these two task prompts to be applied to spatially distinct regions, enabling object movement (repositioning) within a single denoising step. By employing both guidance techniques, extensive experiments show that CrimEdit achieves superior object removal, controllable effect insertion, and efficient object movement without requiring additional training or separate removal and insertion stages.

ROAug 15, 2025
Scene Graph-Guided Proactive Replanning for Failure-Resilient Embodied Agent

Che Rin Yu, Daewon Chae, Dabin Seo et al.

When humans perform everyday tasks, we naturally adjust our actions based on the current state of the environment. For instance, if we intend to put something into a drawer but notice it is closed, we open it first. However, many autonomous robots lack this adaptive awareness. They often follow pre-planned actions that may overlook subtle yet critical changes in the scene, which can result in actions being executed under outdated assumptions and eventual failure. While replanning is critical for robust autonomy, most existing methods respond only after failures occur, when recovery may be inefficient or infeasible. While proactive replanning holds promise for preventing failures in advance, current solutions often rely on manually designed rules and extensive supervision. In this work, we present a proactive replanning framework that detects and corrects failures at subtask boundaries by comparing scene graphs constructed from current RGB-D observations against reference graphs extracted from successful demonstrations. When the current scene fails to align with reference trajectories, a lightweight reasoning module is activated to diagnose the mismatch and adjust the plan. Experiments in the AI2-THOR simulator demonstrate that our approach detects semantic and spatial mismatches before execution failures occur, significantly improving task success and robustness.

CVAug 2, 2025
Open-Attribute Recognition for Person Retrieval: Finding People Through Distinctive and Novel Attributes

Minjeong Park, Hongbeen Park, Sangwon Lee et al.

Pedestrian Attribute Recognition (PAR) plays a crucial role in various vision tasks such as person retrieval and identification. Most existing attribute-based retrieval methods operate under the closed-set assumption that all attribute classes are consistently available during both training and inference. However, this assumption limits their applicability in real-world scenarios where novel attributes may emerge. Moreover, predefined attributes in benchmark datasets are often generic and shared across individuals, making them less discriminative for retrieving the target person. To address these challenges, we propose the Open-Attribute Recognition for Person Retrieval (OAPR) task, which aims to retrieve individuals based on attribute cues, regardless of whether those attributes were seen during training. To support this task, we introduce a novel framework designed to learn generalizable body part representations that cover a broad range of attribute categories. Furthermore, we reconstruct four widely used datasets for open-attribute recognition. Comprehensive experiments on these datasets demonstrate the necessity of the OAPR task and the effectiveness of our framework. The source code and pre-trained models will be publicly available upon publication.

DATA-ANMar 13, 2025
Data augmentation using diffusion models to enhance inverse Ising inference

Yechan Lim, Sangwon Lee, Junghyo Jo

Identifying model parameters from observed configurations poses a fundamental challenge in data science, especially with limited data. Recently, diffusion models have emerged as a novel paradigm in generative machine learning, capable of producing new samples that closely mimic observed data. These models learn the gradient of model probabilities, bypassing the need for cumbersome calculations of partition functions across all possible configurations. We explore whether diffusion models can enhance parameter inference by augmenting small datasets. Our findings demonstrate this potential through a synthetic task involving inverse Ising inference and a real-world application of reconstructing missing values in neural activity data. This study serves as a proof-of-concept for using diffusion models for data augmentation in physics-related problems, thereby opening new avenues in data science.

MLJun 7, 2024
Bayesian Structural Model Updating with Multimodal Variational Autoencoder

Tatsuya Itoi, Kazuho Amishiki, Sangwon Lee et al.

A novel framework for Bayesian structural model updating is presented in this study. The proposed method utilizes the surrogate unimodal encoders of a multimodal variational autoencoder (VAE). The method facilitates an approximation of the likelihood when dealing with a small number of observations. It is particularly suitable for high-dimensional correlated simultaneous observations applicable to various dynamic analysis models. The proposed approach was benchmarked using a numerical model of a single-story frame building with acceleration and dynamic strain measurements. Additionally, an example involving a Bayesian update of nonlinear model parameters for a three-degree-of-freedom lumped mass model demonstrates computational efficiency when compared to using the original VAE, while maintaining adequate accuracy for practical applications.

CVMar 19, 2024
Emotion Recognition Using Transformers with Masked Learning

Seongjae Min, Junseok Yang, Sangjun Lim et al.

In recent years, deep learning has achieved innovative advancements in various fields, including the analysis of human emotions and behaviors. Initiatives such as the Affective Behavior Analysis in-the-wild (ABAW) competition have been particularly instrumental in driving research in this area by providing diverse and challenging datasets that enable precise evaluation of complex emotional states. This study leverages the Vision Transformer (ViT) and Transformer models to focus on the estimation of Valence-Arousal (VA), which signifies the positivity and intensity of emotions, recognition of various facial expressions, and detection of Action Units (AU) representing fundamental muscle movements. This approach transcends traditional Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) based methods, proposing a new Transformer-based framework that maximizes the understanding of temporal and spatial features. The core contributions of this research include the introduction of a learning technique through random frame masking and the application of Focal loss adapted for imbalanced data, enhancing the accuracy and applicability of emotion and behavior analysis in real-world settings. This approach is expected to contribute to the advancement of emotional computing and deep learning methodologies.

ARJan 23, 2022
Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs

Miryeong Kwon, Donghyun Gouk, Sangwon Lee et al.

Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges. In contrast to traditional deep learning, unique behaviors of the emerging GNNs are engaged with a large set of graphs and embedding data on storage, which exhibits complex and irregular preprocessing. We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, near-storage inference infrastructure for fast, energy-efficient GNN processing. To achieve the best end-to-end latency and high energy efficiency, HolisticGNN allows users to implement various GNN algorithms and directly executes them where the actual data exist in a holistic manner. It also enables RPC over PCIe such that the users can simply program GNNs through a graph semantic library without any knowledge of the underlying hardware or storage configurations. We fabricate HolisticGNN's hardware RTL and implement its software on an FPGA-based computational SSD (CSSD). Our empirical evaluations show that the inference time of HolisticGNN outperforms GNN inference services using high-performance modern GPUs by 7.1x while reducing energy consumption by 33.2x, on average.

DATA-ANJan 28, 2021
Inference of stochastic time series with missing data

Sangwon Lee, Vipul Periwal, Junghyo Jo

Inferring dynamics from time series is an important objective in data analysis. In particular, it is challenging to infer stochastic dynamics given incomplete data. We propose an expectation maximization (EM) algorithm that iterates between alternating two steps: E-step restores missing data points, while M-step infers an underlying network model of restored data. Using synthetic data generated by a kinetic Ising model, we confirm that the algorithm works for restoring missing data points as well as inferring the underlying model. At the initial iteration of the EM algorithm, the model inference shows better model-data consistency with observed data points than with missing data points. As we keep iterating, however, missing data points show better model-data consistency. We find that demanding equal consistency of observed and missing data points provides an effective stopping criterion for the iteration to prevent overshooting the most accurate model inference. Armed with this EM algorithm with this stopping criterion, we infer missing data points and an underlying network from a time-series data of real neuronal activities. Our method recovers collective properties of neuronal activities, such as time correlations and firing statistics, which have previously never been optimized to fit.

CVMay 26, 2020
An Effective Pipeline for a Real-world Clothes Retrieval System

Yang-Ho Ji, HeeJae Jun, Insik Kim et al.

In this paper, we propose an effective pipeline for clothes retrieval system which has sturdiness on large-scale real-world fashion data. Our proposed method consists of three components: detection, retrieval, and post-processing. We firstly conduct a detection task for precise retrieval on target clothes, then retrieve the corresponding items with the metric learning-based model. To improve the retrieval robustness against noise and misleading bounding boxes, we apply post-processing methods such as weighted boxes fusion and feature concatenation. With the proposed methodology, we achieved 2nd place in the DeepFashion2 Clothes Retrieval 2020 challenge.