CVAug 31, 2023
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive DiffusionTom Van Wouwe, Seunghwan Lee, Antoine Falisse et al.
Motion capture from a limited number of body-worn sensors, such as inertial measurement units (IMUs) and pressure insoles, has important applications in health, human performance, and entertainment. Recent work has focused on accurately reconstructing whole-body motion from a specific sensor configuration using six IMUs. While a common goal across applications is to use the minimal number of sensors to achieve required accuracy, the optimal arrangement of the sensors might differ from application to application. We propose a single diffusion model, DiffusionPoser, which reconstructs human motion in real-time from an arbitrary combination of sensors, including IMUs placed at specified locations, and, pressure insoles. Unlike existing methods, our model grants users the flexibility to determine the number and arrangement of sensors tailored to the specific activity of interest, without the need for retraining. A novel autoregressive inferencing scheme ensures real-time motion reconstruction that closely aligns with measured sensor signals. The generative nature of DiffusionPoser ensures realistic behavior, even for degrees-of-freedom not directly measured. Qualitative results can be found on our website: https://diffusionposer.github.io/.
CVJan 31, 2023
NoiseTransfer: Image Noise Generation with Contrastive EmbeddingsSeunghwan Lee, Tae Hyun Kim
Deep image denoising networks have achieved impressive success with the help of a considerably large number of synthetic train datasets. However, real-world denoising is a still challenging problem due to the dissimilarity between distributions of real and synthetic noisy datasets. Although several real-world noisy datasets have been presented, the number of train datasets (i.e., pairs of clean and real noisy images) is limited, and acquiring more real noise datasets is laborious and expensive. To mitigate this problem, numerous attempts to simulate real noise models using generative models have been studied. Nevertheless, previous works had to train multiple networks to handle multiple different noise distributions. By contrast, we propose a new generative model that can synthesize noisy images with multiple different noise distributions. Specifically, we adopt recent contrastive learning to learn distinguishable latent features of the noise. Moreover, our model can generate new noisy images by transferring the noise characteristics solely from a single reference noisy image. We demonstrate the accuracy and the effectiveness of our noise model for both known and unknown noise removal.
IVMar 8, 2022
Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot UlcersHan Joo Chae, Seunghwan Lee, Hyewon Son et al.
We introduce AiD Regen, a novel system that generates 3D wound models combining 2D semantic segmentation with 3D reconstruction so that they can be printed via 3D bio-printers during the surgery to treat diabetic foot ulcers (DFUs). AiD Regen seamlessly binds the full pipeline, which includes RGB-D image capturing, semantic segmentation, boundary-guided point-cloud processing, 3D model reconstruction, and 3D printable G-code generation, into a single system that can be used out of the box. We developed a multi-stage data preprocessing method to handle small and unbalanced DFU image datasets. AiD Regen's human-in-the-loop machine learning interface enables clinicians to not only create 3D regenerative patches with just a few touch interactions but also customize and confirm wound boundaries. As evidenced by our experiments, our model outperforms prior wound segmentation models and our reconstruction algorithm is capable of generating 3D wound models with compelling accuracy. We further conducted a case study on a real DFU patient and demonstrated the effectiveness of AiD Regen in treating DFU wounds.
LGJan 29, 2024Code
Enhancing Topological Dependencies in Spatio-Temporal Graphs with Cycle Message Passing BlocksMinho Lee, Yun Young Choi, Sun Woo Park et al.
Graph Neural Networks (GNNs) and Transformer-based models have been increasingly adopted to learn the complex vector representations of spatio-temporal graphs, capturing intricate spatio-temporal dependencies crucial for applications such as traffic datasets. Although many existing methods utilize multi-head attention mechanisms and message-passing neural networks (MPNNs) to capture both spatial and temporal relations, these approaches encode temporal and spatial relations independently, and reflect the graph's topological characteristics in a limited manner. In this work, we introduce the Cycle to Mixer (Cy2Mixer), a novel spatio-temporal GNN based on topological non-trivial invariants of spatio-temporal graphs with gated multi-layer perceptrons (gMLP). The Cy2Mixer is composed of three blocks based on MLPs: A temporal block for capturing temporal properties, a message-passing block for encapsulating spatial information, and a cycle message-passing block for enriching topological information through cyclic subgraphs. We bolster the effectiveness of Cy2Mixer with mathematical evidence emphasizing that our cycle message-passing block is capable of offering differentiated information to the deep learning model compared to the message-passing block. Furthermore, empirical evaluations substantiate the efficacy of the Cy2Mixer, demonstrating state-of-the-art performances across various spatio-temporal benchmark datasets. The source code is available at \url{https://github.com/leemingo/cy2mixer}.
LGDec 26, 2024Code
SyMerge: From Non-Interference to Synergistic Merging via Single-Layer AdaptationAecheon Jung, Seunghwan Lee, Dongyoon Han et al.
Model merging offers an efficient alternative to multi-task learning by combining independently fine-tuned models, but most prior approaches focus mainly on avoiding task interference. We argue instead that the real potential of merging lies in achieving synergy, where tasks enhance one another. Our intuition comes from a pilot study showing that when a classifier trained on one task is paired with the encoder of another, the resulting cross-task performance strongly predicts merge quality. Moreover, adapting even a single task-specific layer can substantially improve this compatibility, suggesting a simple yet powerful lever for synergy. Building on this insight, we introduce SyMerge, a lightweight framework that jointly optimizes one task-specific layer and merging coefficients. To ensure stability without labels, SyMerge employs a robust self-labeling strategy guided by expert model predictions, avoiding the pitfalls of entropy-based adaptation. This minimalist yet principled design achieves state-of-the-art results across vision, dense prediction, and NLP benchmarks, while also producing adapted layers that transfer effectively to other merging methods. Our code is available at https://aim-skku.github.io/SyMerge/
CVMar 8, 2023
InFusionSurf: Refining Neural RGB-D Surface Reconstruction Using Per-Frame Intrinsic Refinement and TSDF Fusion Prior LearningSeunghwan Lee, Gwanmo Park, Hyewon Son et al.
We introduce InFusionSurf, an innovative enhancement for neural radiance field (NeRF) frameworks in 3D surface reconstruction using RGB-D video frames. Building upon previous methods that have employed feature encoding to improve optimization speed, we further improve the reconstruction quality with minimal impact on optimization time by refining depth information. InFusionSurf addresses camera motion-induced blurs in each depth frame through a per-frame intrinsic refinement scheme. It incorporates the truncated signed distance field (TSDF) Fusion, a classical real-time 3D surface reconstruction method, as a pretraining tool for the feature grid, enhancing reconstruction details and training speed. Comparative quantitative and qualitative analyses show that InFusionSurf reconstructs scenes with high accuracy while maintaining optimization efficiency. The effectiveness of our intrinsic refinement and TSDF Fusion-based pretraining is further validated through an ablation study.
CVMar 19
MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language ModelYoungwan Lee, Soojin Jang, Yoorhim Cho et al.
Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments. However, existing benchmarks predominantly focus on elementary, single-hop relations, neglecting the multi-hop compositional reasoning and precise visual grounding essential for real-world scenarios. To address this, we introduce MultihopSpatial, offering three key contributions: (1) A comprehensive benchmark designed for multi-hop and compositional spatial reasoning, featuring 1- to 3-hop complex queries across diverse spatial perspectives. (2) Acc@50IoU, a complementary metric that simultaneously evaluates reasoning and visual grounding by requiring both answer selection and precise bounding box prediction - capabilities vital for robust VLA deployment. (3) MultihopSpatial-Train, a dedicated large-scale training corpus to foster spatial intelligence. Extensive evaluation of 37 state-of-the-art VLMs yields eight key insights, revealing that compositional spatial reasoning remains a formidable challenge. Finally, we demonstrate that reinforcement learning post-training on our corpus enhances both intrinsic VLM spatial reasoning and downstream embodied manipulation performance.
CVDec 9, 2025
Instance-Aware Test-Time Segmentation for Continual Domain ShiftsSeunghwan Lee, Inyoung Jung, Hojoon Lee et al.
Continual Test-Time Adaptation (CTTA) enables pre-trained models to adapt to continuously evolving domains. Existing methods have improved robustness but typically rely on fixed or batch-level thresholds, which cannot account for varying difficulty across classes and instances. This limitation is especially problematic in semantic segmentation, where each image requires dense, multi-class predictions. We propose an approach that adaptively adjusts pseudo labels to reflect the confidence distribution within each image and dynamically balances learning toward classes most affected by domain shifts. This fine-grained, class- and instance-aware adaptation produces more reliable supervision and mitigates error accumulation throughout continual adaptation. Extensive experiments across eight CTTA and TTA scenarios, including synthetic-to-real and long-term shifts, show that our method consistently outperforms state-of-the-art techniques, setting a new standard for semantic segmentation under evolving conditions.
LGMar 10, 2025
Task Vector Quantization for Memory-Efficient Model MergingYoungeun Kim, Seunghwan Lee, Aecheon Jung et al.
Model merging enables efficient multi-task models by combining task-specific fine-tuned checkpoints. However, storing multiple task-specific checkpoints requires significant memory, limiting scalability and restricting model merging to larger models and diverse tasks. In this paper, we propose quantizing task vectors (i.e., the difference between pre-trained and fine-tuned checkpoints) instead of quantizing fine-tuned checkpoints. We observe that task vectors exhibit a narrow weight range, enabling low precision quantization (e.g., 4 bit) within existing task vector merging frameworks. To further mitigate quantization errors within ultra-low bit precision (e.g., 2 bit), we introduce Residual Task Vector Quantization, which decomposes the task vector into a base vector and offset component. We allocate bits based on quantization sensitivity, ensuring precision while minimizing error within a memory budget. Experiments on image classification and dense prediction show our method maintains or improves model merging performance while using only 8% of the memory required for full-precision checkpoints.
LGNov 24, 2025
Hi-SAFE: Hierarchical Secure Aggregation for Lightweight Federated LearningHyeong-Gun Joo, Songnam Hong, Seunghwan Lee et al.
Federated learning (FL) faces challenges in ensuring both privacy and communication efficiency, particularly in resource-constrained environments such as Internet of Things (IoT) and edge networks. While sign-based methods, such as sign stochastic gradient descent with majority voting (SIGNSGD-MV), offer substantial bandwidth savings, they remain vulnerable to inference attacks due to exposure of gradient signs. Existing secure aggregation techniques are either incompatible with sign-based methods or incur prohibitive overhead. To address these limitations, we propose Hi-SAFE, a lightweight and cryptographically secure aggregation framework for sign-based FL. Our core contribution is the construction of efficient majority vote polynomials for SIGNSGD-MV, derived from Fermat's Little Theorem. This formulation represents the majority vote as a low-degree polynomial over a finite field, enabling secure evaluation that hides intermediate values and reveals only the final result. We further introduce a hierarchical subgrouping strategy that ensures constant multiplicative depth and bounded per-user complexity, independent of the number of users n.
CVJul 24, 2021
Cycled Compositional Learning between Images and TextJongseok Kim, Youngjae Yu, Seunghwan Lee et al.
We present an approach named the Cycled Composition Network that can measure the semantic distance of the composition of image-text embedding. First, the Composition Network transit a reference image to target image in an embedding space using relative caption. Second, the Correction Network calculates a difference between reference and retrieved target images in the embedding space and match it with a relative caption. Our goal is to learn a Composition mapping with the Composition Network. Since this one-way mapping is highly under-constrained, we couple it with an inverse relation learning with the Correction Network and introduce a cycled relation for given Image We participate in Fashion IQ 2020 challenge and have won the first place with the ensemble of our model.
CVMar 27, 2020
CurlingNet: Compositional Learning between Images and Text for Fashion IQ DataYoungjae Yu, Seunghwan Lee, Yuncheol Choi et al.
We present an approach named CurlingNet that can measure the semantic distance of composition of image-text embedding. In order to learn an effective image-text composition for the data in the fashion domain, our model proposes two key components as follows. First, the Delivery makes the transition of a source image in an embedding space. Second, the Sweeping emphasizes query-related components of fashion images in the embedding space. We utilize a channel-wise gating mechanism to make it possible. Our single model outperforms previous state-of-the-art image-text composition models including TIRG and FiLM. We participate in the first fashion-IQ challenge in ICCV 2019, for which ensemble of our model achieves one of the best performances.
CVMar 9, 2020
Restore from Restored: Video Restoration with Pseudo Clean VideoSeunghwan Lee, Donghyeon Cho, Jiwon Kim et al.
In this study, we propose a self-supervised video denoising method called "restore-from-restored." This method fine-tunes a pre-trained network by using a pseudo clean video during the test phase. The pseudo clean video is obtained by applying a noisy video to the baseline network. By adopting a fully convolutional neural network (FCN) as the baseline, we can improve video denoising performance without accurate optical flow estimation and registration steps, in contrast to many conventional video restoration methods, due to the translation equivariant property of the FCN. Specifically, the proposed method can take advantage of plentiful similar patches existing across multiple consecutive frames (i.e., patch-recurrence); these patches can boost the performance of the baseline network by a large margin. We analyze the restoration performance of the fine-tuned video denoising networks with the proposed self-supervision-based learning algorithm, and demonstrate that the FCN can utilize recurring patches without requiring accurate registration among adjacent frames. In our experiments, we apply the proposed method to state-of-the-art denoisers and show that our fine-tuned networks achieve a considerable improvement in denoising performance.
CVMar 9, 2020
Restore from Restored: Single Image Denoising with Pseudo Clean ImageSeunghwan Lee, Dongkyu Lee, Donghyeon Cho et al.
In this study, we propose a simple and effective fine-tuning algorithm called "restore-from-restored", which can greatly enhance the performance of fully pre-trained image denoising networks. Many supervised denoising approaches can produce satisfactory results using large external training datasets. However, these methods have limitations in using internal information available in a given test image. By contrast, recent self-supervised approaches can remove noise in the input image by utilizing information from the specific test input. However, such methods show relatively lower performance on known noise types such as Gaussian noise compared to supervised methods. Thus, to combine external and internal information, we fine-tune the fully pre-trained denoiser using pseudo training set at test time. By exploiting internal self-similar patches (i.e., patch-recurrence), the baseline network can be adapted to the given specific input image. We demonstrate that our method can be easily employed on top of the state-of-the-art denoising networks and further improve the performance on numerous denoising benchmark datasets including real noisy images.
CVJan 9, 2020
Self-Supervised Fast Adaptation for Denoising via Meta-LearningSeunghwan Lee, Donghyeon Cho, Jiwon Kim et al.
Under certain statistical assumptions of noise, recent self-supervised approaches for denoising have been introduced to learn network parameters without true clean images, and these methods can restore an image by exploiting information available from the given input (i.e., internal statistics) at test time. However, self-supervised methods are not yet combined with conventional supervised denoising methods which train the denoising networks with a large number of external training samples. Thus, we propose a new denoising approach that can greatly outperform the state-of-the-art supervised denoising methods by adapting their network parameters to the given input through selfsupervision without changing the networks architectures. Moreover, we propose a meta-learning algorithm to enable quick adaptation of parameters to the specific input at test time. We demonstrate that the proposed method can be easily employed with state-of-the-art denoising networks without additional parameters, and achieve state-of-the-art performance on numerous benchmark datasets.
CRSep 30, 2019
Analysis of error dependencies on NewHopeMinki Song, Seunghwan Lee, Eunsang Lee et al.
Among many submissions to the NIST post-quantum cryptography (PQC) project, NewHope is a promising key encapsulation mechanism (KEM) based on the Ring-Learning with errors (Ring-LWE) problem. Since NewHope is an indistinguishability (IND)-chosen ciphertext attack secure KEM by applying the Fujisaki-Okamoto transform to an IND-chosen plaintext attack secure public key encryption, accurate calculation of decryption failure rate (DFR) is required to guarantee resilience against attacks that exploit decryption failures. However, the current upper bound of DFR on NewHope is rather loose because the compression noise, the effect of encoding/decoding of NewHope, and the approximation effect of centered binomial distribution are not fully considered. Furthermore, since NewHope is a Ring-LWE based cryptosystem, there is a problem of error dependency among error coefficients, which makes accurate DFR calculation difficult. In this paper, we derive much tighter upper bound on DFR than the current upper bound using constraint relaxation and union bound. Especially, the above-mentioned factors are all considered in derivation of new upper bound and the centered binomial distribution is not approximated to subgaussian distribution. In addition, since the error dependency is considered, the new upper bound is much closer to the real DFR than the previous upper bound. Furthermore, the new upper bound is parameterized by using Chernoff-Cramer bound in order to facilitate calculation of new upper bound for the parameters of NewHope. Since the new upper bound is much lower than the DFR requirement of PQC, this DFR margin is used to improve the security and bandwidth efficiency of NewHope. As a result, the security level of NewHope is improved by 7.2 % or bandwidth efficiency is improved by 5.9 %.
CRMay 20, 2019
Improving security and bandwidth efficiency of NewHope using error-correction schemesMinki Song, Seunghwan Lee, Eunsang Lee et al.
Among many submissions to the NIST post-quantum cryptography (PQC) project, NewHope is a promising key encapsulation mechanism (KEM) based on the Ring-Learning with errors (Ring-LWE) problem. Since the most important factors to be considered for PQC are security and cost including bandwidth and time/space complexity, in this paper, by doing exact noise analysis and using Bose Chaudhuri Hocquenghem (BCH) codes, it is shown that the security and bandwidth efficiency of NewHope can be substantially improved. In detail, the decryption failure rate (DFR) of NewHope is recalculated by performing exact noise analysis, and it is shown that the DFR of NewHope has been too conservatively calculated. Since the recalculated DFR is much lower than the required $2^{-128}$, this DFR margin is exploited to improve the security up to 8.5 \% or the bandwidth efficiency up to 5.9 \% without changing the procedure of NewHope. The additive threshold encoding (ATE) used in NewHope is a simple error correcting code (ECC) robust to side channel attack, but its error-correction capability is relatively weak compared with other ECCs. Therefore, if a proper error-correction scheme is applied to NewHope, either security or bandwidth efficiency or both can be improved. Among various ECCs, BCH code has been widely studied for its application to cryptosystems due to its advantages such as no error floor problem. In this paper, the ATE and total noise channel are regarded as a super channel from an information-theoretic viewpoint. Based on this super channel analysis, various concatenated coding schemes of ATE and BCH code for NewHope have been investigated. Through numerical analysis, it is revealed that the security and bandwidth efficiency of NewHope are substantially improved by using the proposed error-correction schemes.
AIJun 27, 2012
Reasoning about Uncertainty in Metric SpacesSeunghwan Lee
We set up a model for reasoning about metric spaces with belief theoretic measures. The uncertainty in these spaces stems from both probability and metric. To represent both aspect of uncertainty, we choose an expected distance function as a measure of uncertainty. A formal logical system is constructed for the reasoning about expected distance. Soundness and completeness are shown for this logic. For reasoning on product metric space with uncertainty, a new metric is defined and shown to have good properties.