Hyunsung Jang

CV
h-index7
3papers
25citations
Novelty52%
AI Score39

3 Papers

CVJul 18, 2024Code
Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation

Ilhoon Yoon, Hyeongjun Kwon, Jin Kim et al.

Source-Free domain adaptive Object Detection (SFOD) is a promising strategy for deploying trained detectors to new, unlabeled domains without accessing source data, addressing significant concerns around data privacy and efficiency. Most SFOD methods leverage a Mean-Teacher (MT) self-training paradigm relying heavily on High-confidence Pseudo Labels (HPL). However, these HPL often overlook small instances that undergo significant appearance changes with domain shifts. Additionally, HPL ignore instances with low confidence due to the scarcity of training samples, resulting in biased adaptation toward familiar instances from the source domain. To address this limitation, we introduce the Low-confidence Pseudo Label Distillation (LPLD) loss within the Mean-Teacher based SFOD framework. This novel approach is designed to leverage the proposals from Region Proposal Network (RPN), which potentially encompasses hard-to-detect objects in unfamiliar domains. Initially, we extract HPL using a standard pseudo-labeling technique and mine a set of Low-confidence Pseudo Labels (LPL) from proposals generated by RPN, leaving those that do not overlap significantly with HPL. These LPL are further refined by leveraging class-relation information and reducing the effect of inherent noise for the LPLD loss calculation. Furthermore, we use feature distance to adaptively weight the LPLD loss to focus on LPL containing a larger foreground area. Our method outperforms previous SFOD methods on four cross-domain object detection benchmarks. Extensive experiments demonstrate that our LPLD loss leads to effective adaptation by reducing false negatives and facilitating the use of domain-invariant knowledge from the source model. Code is available at https://github.com/junia3/LPLD.

CVSep 26, 2023
Treating Motion as Option with Output Selection for Unsupervised Video Object Segmentation

Suhwan Cho, Minhyeok Lee, Jungho Lee et al.

Unsupervised video object segmentation aims to detect the most salient object in a video without any external guidance regarding the object. Salient objects often exhibit distinctive movements compared to the background, and recent methods leverage this by combining motion cues from optical flow maps with appearance cues from RGB images. However, because optical flow maps are often closely correlated with segmentation masks, networks can become overly dependent on motion cues during training, leading to vulnerability when faced with confusing motion cues and resulting in unstable predictions. To address this challenge, we propose a novel motion-as-option network that treats motion cues as an optional component rather than a necessity. During training, we randomly input RGB images into the motion encoder instead of optical flow maps, which implicitly reduces the network's reliance on motion cues. This design ensures that the motion encoder is capable of processing both RGB images and optical flow maps, leading to two distinct predictions depending on the type of input provided. To make the most of this flexibility, we introduce an adaptive output selection algorithm that determines the optimal prediction during testing.

LGOct 16, 2025
First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training

Gyudong Kim, Hyukju Na, Jin Hyeon Kim et al.

As training billion-scale transformers becomes increasingly common, employing multiple distributed GPUs along with parallel training methods has become a standard practice. However, existing transformer designs suffer from significant communication overhead, especially in Tensor Parallelism (TP), where each block's MHA-MLP connection requires an all-reduce communication. Through our investigation, we show that the MHA-MLP connections can be bypassed for efficiency, while the attention output of the first layer can serve as an alternative signal for the bypassed connection. Motivated by the observations, we propose FAL (First Attentions Last), an efficient transformer architecture that redirects the first MHA output to the MLP inputs of the following layers, eliminating the per-block MHA-MLP connections. This removes the all-reduce communication and enables parallel execution of MHA and MLP on a single GPU. We also introduce FAL+, which adds the normalized first attention output to the MHA outputs of the following layers to augment the MLP input for the model quality. Our evaluation shows that FAL reduces multi-GPU training time by up to 44%, improves single-GPU throughput by up to 1.18x, and achieves better perplexity compared to the baseline GPT. FAL+ achieves even lower perplexity without increasing the training time than the baseline.