Joong‐won Hwang

h-index5

8papers

543citations

Novelty55%

AI Score54

Ranked #11,290 of 194,257 authors (top 6%)#4,202 in CV (top 7%)

8 Papers

8.2CVApr 30Code

Improving Calibration in Test-Time Prompt Tuning for Vision-Language Models via Data-Free Flatness-Aware Prompt Pretraining

Hyeonseo Jang, Jaebyeong Jeon, Joong-Won Hwang et al.

Test-time prompt tuning (TPT) has emerged as a promising technique for enhancing the adaptability of vision-language models by optimizing textual prompts using unlabeled test data. However, prior studies have observed that TPT often produces poorly calibrated models, raising concerns about the reliability of their predictions. Recent works address this issue by incorporating additional regularization terms that constrain model outputs, which improve calibration but often degrade performance. In this work, we reveal that these regularization strategies implicitly encourage optimization toward flatter minima, and that the sharpness of the loss landscape around adapted prompts is a key factor governing calibration quality. Motivated by this observation, we introduce Flatness-aware Prompt Pretraining (FPP), a simple yet effective pretraining framework for TPT that initializes prompts within flatter regions of the loss landscape prior to adaptation. We show that simply replacing the initialization in existing TPT pipelines--without modifying any other components--is sufficient to improve both calibration and performance. Notably, FPP requires no labeled data and incurs no additional computational costs during test-time tuning, making it highly practical for real-world deployment. The code is available at: https://github.com/YonseiML/fpp.

2.7LGMar 4Code

When and Where to Reset Matters for Long-Term Test-Time Adaptation

Taejun Lim, Joong-Won Hwang, Kibok Lee

When continual test-time adaptation (TTA) persists over the long term, errors accumulate in the model and further cause it to predict only a few classes for all inputs, a phenomenon known as model collapse. Recent studies have explored reset strategies that completely erase these accumulated errors. However, their periodic resets lead to suboptimal adaptation, as they occur independently of the actual risk of collapse. Moreover, their full resets cause catastrophic loss of knowledge acquired over time, even though such knowledge could be beneficial in the future. To this end, we propose (1) an Adaptive and Selective Reset (ASR) scheme that dynamically determines when and where to reset, (2) an importance-aware regularizer to recover essential knowledge lost due to reset, and (3) an on-the-fly adaptation adjustment scheme to enhance adaptability under challenging domain shifts. Extensive experiments across long-term TTA benchmarks demonstrate the effectiveness of our approach, particularly under challenging conditions. Our code is available at https://github.com/YonseiML/asr.

3.6CVAug 16, 2025Code

Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability

Seungju Yoo, Hyuk Kwon, Joong-Won Hwang et al.

Recent advances in computer vision have made training object detectors more efficient and effective; however, assessing their performance in real-world applications still relies on costly manual annotation. To address this limitation, we develop an automated model evaluation (AutoEval) framework for object detection. We propose Prediction Consistency and Reliability (PCR), which leverages the multiple candidate bounding boxes that conventional detectors generate before non-maximum suppression (NMS). PCR estimates detection performance without ground-truth labels by jointly measuring 1) the spatial consistency between boxes before and after NMS, and 2) the reliability of the retained boxes via the confidence scores of overlapping boxes. For a more realistic and scalable evaluation, we construct a meta-dataset by applying image corruptions of varying severity. Experimental results demonstrate that PCR yields more accurate performance estimates than existing AutoEval methods, and the proposed meta-dataset covers a wider range of detection performance. The code is available at https://github.com/YonseiML/autoeval-det.

2.0CVMar 18, 2024

OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation

Seungbeom Woo, Geonwoo Baek, Taehoon Kim et al.

Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher architectures, where each teacher specializes in one target domain to simplify the task. However, these architectures hinder the student model from fully assimilating comprehensive knowledge from all target-specific teachers and escalate training costs with increasing target domains. In this paper, we propose an ouroboric domain bridging (OurDB) framework, offering an efficient solution to the MTDA problem using a single teacher architecture. This framework dynamically cycles through multiple target domains, aligning each domain individually to restrain the biased alignment problem, and utilizes Fisher information to minimize the forgetting of knowledge from previous target domains. We also propose a context-guided class-wise mixup (CGMix) that leverages contextual information tailored to diverse target contexts in MTDA. Experimental evaluations conducted on four urban driving datasets (i.e., GTA5, Cityscapes, IDD, and Mapillary) demonstrate the superiority of our method over existing state-of-the-art approaches.

9.0LGSep 21, 2020

Adversarial Training with Stochastic Weight Average

Joong-Won Hwang, Youngwan Lee, Sungchan Oh et al.

Adversarial training deep neural networks often experience serious overfitting problem. Recently, it is explained that the overfitting happens because the sample complexity of training data is insufficient to generalize robustness. In traditional machine learning, one way to relieve overfitting from the lack of data is to use ensemble methods. However, adversarial training multiple networks is extremely expensive. Moreover, we found that there is a dilemma on choosing target model to generate adversarial examples. Optimizing attack to the members of ensemble will be suboptimal attack to the ensemble and incurs covariate shift, while attack to ensemble will weaken the members and lose the benefit from ensembling. In this paper, we propose adversarial training with Stochastic weight average (SWA); while performing adversarial training, we aggregate the temporal weight states in the trajectory of training. By adopting SWA, the benefit of ensemble can be gained without tremendous computational increment and without facing the dilemma. Moreover, we further improved SWA to be adequate to adversarial training. The empirical results on CIFAR-10, CIFAR-100 and SVHN show that our method can improve the robustness of models.

12.8CVJun 28, 2020

Localization Uncertainty Estimation for Anchor-Free Object Detection

Youngwan Lee, Joong-won Hwang, Hyung-Il Kim et al.

Since many safety-critical systems, such as surgical robots and autonomous driving cars operate in unstable environments with sensor noise and incomplete data, it is desirable for object detectors to take the localization uncertainty into account. However, there are several limitations of the existing uncertainty estimation methods for anchor-based object detection. 1) They model the uncertainty of the heterogeneous object properties with different characteristics and scales, such as location (center point) and scale (width, height), which could be difficult to estimate. 2) They model box offsets as Gaussian distributions, which is not compatible with the ground truth bounding boxes that follow the Dirac delta distribution. 3) Since anchor-based methods are sensitive to anchor hyper-parameters, their localization uncertainty could also be highly sensitive to the choice of hyper-parameters. To tackle these limitations, we propose a new localization uncertainty estimation method called UAD for anchor-free object detection. Our method captures the uncertainty in four directions of box offsets (left, right, top, bottom) that are homogeneous, so that it can tell which direction is uncertain, and provide a quantitative value of uncertainty in [0, 1]. To enable such uncertainty estimation, we design a new uncertainty loss, negative power log-likelihood loss, to measure the localization uncertainty by weighting the likelihood loss by its IoU, which alleviates the model misspecification problem. Furthermore, we propose an uncertainty-aware focal loss for reflecting the estimated uncertainty to the classification score. Experimental results on COCO datasets demonstrate that our method significantly improves FCOS, by up to 1.8 points, without sacrificing computational efficiency.

27.8CVApr 22, 2019

An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection

Youngwan Lee, Joong-won Hwang, Sangrok Lee et al.

As DenseNet conserves intermediate features with diverse receptive fields by aggregating them with dense connection, it shows good performance on the object detection task. Although feature reuse enables DenseNet to produce strong features with a small number of model parameters and FLOPs, the detector with DenseNet backbone shows rather slow speed and low energy efficiency. We find the linearly increasing input channel by dense connection leads to heavy memory access cost, which causes computation overhead and more energy consumption. To solve the inefficiency of DenseNet, we propose an energy and computation efficient architecture called VoVNet comprised of One-Shot Aggregation (OSA). The OSA not only adopts the strength of DenseNet that represents diversified features with multi receptive fields but also overcomes the inefficiency of dense connection by aggregating all features only once in the last feature maps. To validate the effectiveness of VoVNet as a backbone network, we design both lightweight and large-scale VoVNet and apply them to one-stage and two-stage object detectors. Our VoVNet based detectors outperform DenseNet based ones with 2x faster speed and the energy consumptions are reduced by 1.6x - 4.1x. In addition to DenseNet, VoVNet also outperforms widely used ResNet backbone with faster speed and better energy efficiency. In particular, the small object detection performance has been significantly improved over DenseNet and ResNet.

3.8CVDec 1, 2017

Rank of Experts: Detection Network Ensemble

Seung-Hwan Bae, Youngwan Lee, Youngjoo Jo et al.

The recent advances of convolutional detectors show impressive performance improvement for large scale object detection. However, in general, the detection performance usually decreases as the object classes to be detected increases, and it is a practically challenging problem to train a dominant model for all classes due to the limitations of detection models and datasets. In most cases, therefore, there are distinct performance differences of the modern convolutional detectors for each object class detection. In this paper, in order to build an ensemble detector for large scale object detection, we present a conceptually simple but very effective class-wise ensemble detection which is named as Rank of Experts. We first decompose an intractable problem of finding the best detections for all object classes into small subproblems of finding the best ones for each object class. We then solve the detection problem by ranking detectors in order of the average precision rate for each class, and then aggregate the responses of the top ranked detectors (i.e. experts) for class-wise ensemble detection. The main benefit of our method is easy to implement and does not require any joint training of experts for ensemble. Based on the proposed Rank of Experts, we won the 2nd place in the ILSVRC 2017 object detection competition.