Jihun Kim

CV
h-index9
13papers
33citations
Novelty52%
AI Score55

13 Papers

33.7CVApr 14
Bootstrapping Video Semantic Segmentation Model via Distillation-assisted Test-Time Adaptation

Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.

Fully supervised Video Semantic Segmentation (VSS) relies heavily on densely annotated video data, limiting practical applicability. Alternatively, applying pre-trained Image Semantic Segmentation (ISS) models frame-by-frame avoids annotation costs but ignores crucial temporal coherence. Recent foundation models such as SAM2 enable high-quality mask propagation yet remain impractical for direct VSS due to limited semantic understanding and computational overhead. In this paper, we propose DiTTA (Distillation-assisted Test-Time Adaptation), a novel framework that converts an ISS model into a temporally-aware VSS model through efficient test-time adaptation (TTA), without annotated videos. DiTTA distills SAM2's temporal segmentation knowledge into the ISS model during a brief, single-pass initialization phase, complemented by a lightweight temporal fusion module to aggregate cross-frame context. Crucially, DiTTA achieves robust generalization even when adapting with highly limited partial video snippets (e.g., initial 10%), significantly outperforming zero-shot refinement approaches that repeatedly invoke SAM2 during inference. Extensive experiments on VSPW and Cityscapes demonstrate DiTTA's effectiveness, achieving competitive or superior performance relative to fully-supervised VSS methods, thus providing a practical and annotation-free solution for real-world VSS tasks.

CVOct 21, 2024Code
TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of Sight

Hyun-Kurl Jang, Jihun Kim, Hyeokjun Kweon et al.

Semantic Scene Completion (SSC) aims to perform geometric completion and semantic segmentation simultaneously. Despite the promising results achieved by existing studies, the inherently ill-posed nature of the task presents significant challenges in diverse driving scenarios. This paper introduces TALoS, a novel test-time adaptation approach for SSC that excavates the information available in driving environments. Specifically, we focus on that observations made at a certain moment can serve as Ground Truth (GT) for scene completion at another moment. Given the characteristics of the LiDAR sensor, an observation of an object at a certain location confirms both 1) the occupation of that location and 2) the absence of obstacles along the line of sight from the LiDAR to that point. TALoS utilizes these observations to obtain self-supervision about occupancy and emptiness, guiding the model to adapt to the scene in test time. In a similar manner, we aggregate reliable SSC predictions among multiple moments and leverage them as semantic pseudo-GT for adaptation. Further, to leverage future observations that are not accessible at the current time, we present a dual optimization scheme using the model in which the update is delayed until the future observation is available. Evaluations on the SemanticKITTI validation and test sets demonstrate that TALoS significantly improves the performance of the pre-trained SSC model. Our code is available at https://github.com/blue-531/TALoS.

52.1OCMar 10
A Trust-Region Interior-Point Stochastic Sequential Quadratic Programming Method

Yuchen Fang, Jihun Kim, Sen Na et al.

In this paper, we propose a trust-region interior-point stochastic sequential quadratic programming (TR-IP-SSQP) method for solving optimization problems with a stochastic objective and deterministic nonlinear equality and inequality constraints. In this setting, exact evaluations of the objective function and its gradient are unavailable, but their stochastic estimates can be constructed. In particular, at each iteration our method builds stochastic oracles, which estimate the objective value and gradient to satisfy proper adaptive accuracy conditions with a fixed probability. To handle inequality constraints, we adopt an interior-point method (IPM), in which the barrier parameter follows a prescribed decaying sequence. Under standard assumptions, we establish global almost-sure convergence of the proposed method to first-order stationary points. We implement the method on a subset of problems from the CUTEst test set, as well as on logistic regression problems, to demonstrate its practical performance.

CVOct 20, 2020Code
Two-stage generative adversarial networks for document image binarization with color noise and background removal

Sungho Suh, Jihun Kim, Paul Lukowicz et al.

Document image enhancement and binarization methods are often used to improve the accuracy and efficiency of document image analysis tasks such as text recognition. Traditional non-machine-learning methods are constructed on low-level features in an unsupervised manner but have difficulty with binarization on documents with severely degraded backgrounds. Convolutional neural network-based methods focus only on grayscale images and on local textual features. In this paper, we propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks. In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image for document image enhancement. In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size. For the adversarial neural networks, we formulate loss functions between a discriminator and generators having an encoder-decoder structure. Experimental results show that the proposed method achieves better performance than many classical and state-of-the-art algorithms over the Document Image Binarization Contest (DIBCO) datasets, the LRDE Document Binarization Dataset (LRDE DBD), and our shipping label image dataset. We plan to release the shipping label dataset as well as our implementation code at github.com/opensuh/DocumentBinarization/.

CVNov 20, 2023
Balancing Efficiency and Quality: MoEISR for Arbitrary-Scale Image Super-Resolution

Young Jae Oh, Jihun Kim, Jihoon Nam et al.

Arbitrary-scale image super-resolution employing implicit neural functions has gained significant attention lately due to its capability to upscale images across diverse scales utilizing only a single model. Nevertheless, these methodologies have imposed substantial computational demands as they involve querying every target pixel to a single resource-intensive decoder. In this paper, we introduce a novel and efficient framework, the Mixture-of-Experts Implicit Super-Resolution (MoEISR), which enables super-resolution at arbitrary scales with significantly increased computational efficiency without sacrificing reconstruction quality. MoEISR dynamically allocates the most suitable decoding expert to each pixel using a lightweight mapper module, allowing experts with varying capacities to reconstruct pixels across regions with diverse complexities. Our experiments demonstrate that MoEISR successfully reduces significant amount of floating point operations (FLOPs) while delivering comparable or superior peak signal-to-noise ratio (PSNR).

55.2LGMay 7
Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

Ying Chen, Aoxi Li, Jihun Kim et al.

The massive computational costs of scaling modern deep learning architectures have driven the widespread use of parameter-efficient low-rank structures, such as LoRA and low-rank factorization. However, theoretical guarantees for their expressive power are less explored, often relying on restrictive priors like a pretrained base matrix, ReLU activations or non-verifiable singularity conditions. We first investigate the limits of neural networks constrained strictly to low-rank manifolds without pretrained dense priors. We demonstrate a theoretical paradox: while purely rank-1 layers can exactly interpolate arbitrary scalar datasets, they collapse for function approximations. To overcome this bottleneck without surrendering parameter efficiency, we introduce a unified \textit{Structural Correspondence} framework. We prove that augmenting low-rank layers with only a minimal sparse diagonal component, say a Diagonal plus Low-Rank (DLoR) structure, is sufficient to reach Universal Approximation. We show that any full-rank transformation can be exactly reconstructed using these DLoR components by trading off network width (additive decomposition) or depth (multiplicative decomposition). By tracking asymptotic Taylor remainders, we prove that DLoR neural networks fully restore the Universal Approximation Theorem for general activation functions. Finally, we establish that multiplicative depth provides superior parameter-to-expressivity scaling compared to additive width. Our results show that dense matrices and specific activation functions are not topological prerequisites for universal expressivity.

OCMar 20, 2025
Subgradient Method for System Identification with Non-Smooth Objectives

Baturalp Yalcin, Jihun Kim, Javad Lavaei

This paper investigates a subgradient-based algorithm to solve the system identification problem for linear time-invariant systems with non-smooth objectives. This is essential for robust system identification in safety-critical applications. While existing work provides theoretical exact recovery guarantees using optimization solvers, the design of fast learning algorithms with convergence guarantees for practical use remains unexplored. We analyze the subgradient method in this setting, where the optimization problems to be solved evolve over time as new measurements are collected, and we establish linear convergence to the ground-truth system for both the best and Polyak step sizes after a burn-in period. We further characterize sublinear convergence of the iterates under constant and diminishing step sizes, which require only minimal information and thus offer broad applicability. Finally, we compare the time complexity of standard solvers with the subgradient algorithm and support our findings with experimental results. This is the first work to analyze subgradient algorithms for system identification with non-smooth objectives.

LGFeb 20
SeedFlood: A Step Toward Scalable Decentralized Training of LLMs

Jihun Kim, Namhoon Lee

This work presents a new approach to decentralized training-SeedFlood-designed to scale for large models across complex network topologies and achieve global consensus with minimal communication overhead. Traditional gossip-based methods suffer from message communication costs that grow with model size, while information decay over network hops renders global consensus inefficient. SeedFlood departs from these practices by exploiting the seed-reconstructible structure of zeroth-order updates and effectively making the messages near-zero in size, allowing them to be flooded to every client in the network. This mechanism makes communication overhead negligible and independent of model size, removing the primary scalability bottleneck in decentralized training. Consequently, SeedFlood enables training in regimes previously considered impractical, such as billion-parameter models distributed across hundreds of clients. Our experiments on decentralized LLM fine-tuning demonstrate thatSeedFlood consistently outperforms gossip-based baselines in both generalization performance and communication efficiency, and even achieves results comparable to first-order methods in large scale settings.

LGNov 17, 2025
Real-time prediction of breast cancer sites using deformation-aware graph neural network

Kyunghyun Lee, Yong-Min Shin, Minwoo Shin et al.

Early diagnosis of breast cancer is crucial, enabling the establishment of appropriate treatment plans and markedly enhancing patient prognosis. While direct magnetic resonance imaging-guided biopsy demonstrates promising performance in detecting cancer lesions, its practical application is limited by prolonged procedure times and high costs. To overcome these issues, an indirect MRI-guided biopsy that allows the procedure to be performed outside of the MRI room has been proposed, but it still faces challenges in creating an accurate real-time deformable breast model. In our study, we tackled this issue by developing a graph neural network (GNN)-based model capable of accurately predicting deformed breast cancer sites in real time during biopsy procedures. An individual-specific finite element (FE) model was developed by incorporating magnetic resonance (MR) image-derived structural information of the breast and tumor to simulate deformation behaviors. A GNN model was then employed, designed to process surface displacement and distance-based graph data, enabling accurate prediction of overall tissue displacement, including the deformation of the tumor region. The model was validated using phantom and real patient datasets, achieving an accuracy within 0.2 millimeters (mm) for cancer node displacement (RMSE) and a dice similarity coefficient (DSC) of 0.977 for spatial overlap with actual cancerous regions. Additionally, the model enabled real-time inference and achieved a speed-up of over 4,000 times in computational cost compared to conventional FE simulations. The proposed deformation-aware GNN model offers a promising solution for real-time tumor displacement prediction in breast biopsy, with high accuracy and real-time capability. Its integration with clinical procedures could significantly enhance the precision and efficiency of breast cancer diagnosis.

CVJun 29, 2025
DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation

Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.

Interactive segmentation (IS) allows users to iteratively refine object boundaries with minimal cues, such as positive and negative clicks. While the Segment Anything Model (SAM) has garnered attention in the IS community for its promptable segmentation capabilities, it often struggles in specialized domains or when handling complex scenarios (e.g., camouflaged or multi-part objects). To overcome these challenges, we propose DC-TTA, a novel test-time adaptation (TTA) framework that adapts SAM on a per-sample basis by leveraging user interactions as supervision. Instead of forcing a single model to incorporate all user clicks at once, DC-TTA partitions the clicks into more coherent subsets, each processed independently via TTA with a separated model. This Divide-and-Conquer strategy reduces conflicts among diverse cues and enables more localized updates. Finally, we merge the adapted models to form a unified predictor that integrates the specialized knowledge from each subset. Experimental results across various benchmarks demonstrate that DC-TTA significantly outperforms SAM's zero-shot results and conventional TTA methods, effectively handling complex tasks such as camouflaged object segmentation with fewer interactions and improved accuracy.

QMApr 9, 2025
Artificial Intelligence for Pediatric Height Prediction Using Large-Scale Longitudinal Body Composition Data

Dohyun Chun, Hae Woon Jung, Jongho Kang et al.

This study developed an accurate artificial intelligence model for predicting future height in children and adolescents using anthropometric and body composition data from the GP Cohort Study (588,546 measurements from 96,485 children aged 7-18). The model incorporated anthropometric measures, body composition, standard deviation scores, and growth velocity parameters, with performance evaluated using RMSE, MAE, and MAPE. Results showed high accuracy with males achieving average RMSE, MAE, and MAPE of 2.51 cm, 1.74 cm, and 1.14%, and females showing 2.28 cm, 1.68 cm, and 1.13%, respectively. Explainable AI approaches identified height SDS, height velocity, and soft lean mass velocity as crucial predictors. The model generated personalized growth curves by estimating individual-specific height trajectories, offering a robust tool for clinical decision support, early identification of growth disorders, and optimization of growth outcomes.

CVMar 30, 2024
Long-Tailed Recognition on Binary Networks by Calibrating A Pre-trained Model

Jihun Kim, Dahyun Kim, Hyungrok Jung et al.

Deploying deep models in real-world scenarios entails a number of challenges, including computational efficiency and real-world (e.g., long-tailed) data distributions. We address the combined challenge of learning long-tailed distributions using highly resource-efficient binary neural networks as backbones. Specifically, we propose a calibrate-and-distill framework that uses off-the-shelf pretrained full-precision models trained on balanced datasets to use as teachers for distillation when learning binary networks on long-tailed datasets. To better generalize to various datasets, we further propose a novel adversarial balancing among the terms in the objective function and an efficient multiresolution learning scheme. We conducted the largest empirical study in the literature using 15 datasets, including newly derived long-tailed datasets from existing balanced datasets, and show that our proposed method outperforms prior art by large margins (>14.33% on average).

LGSep 10, 2021
6MapNet: Representing soccer players from tracking data by a triplet network

Hyunsung Kim, Jihun Kim, Dongwook Chung et al.

Although the values of individual soccer players have become astronomical, subjective judgments still play a big part in the player analysis. Recently, there have been new attempts to quantitatively grasp players' styles using video-based event stream data. However, they have some limitations in scalability due to high annotation costs and sparsity of event stream data. In this paper, we build a triplet network named 6MapNet that can effectively capture the movement styles of players using in-game GPS data. Without any annotation of soccer-specific actions, we use players' locations and velocities to generate two types of heatmaps. Our subnetworks then map these heatmap pairs into feature vectors whose similarity corresponds to the actual similarity of playing styles. The experimental results show that players can be accurately identified with only a small number of matches by our method.