Donggeun Yoo

CV
h-index31
20papers
2,371citations
Novelty56%
AI Score45

20 Papers

CVMar 29, 2022Code
Interactive Multi-Class Tiny-Object Detection

Chunggi Lee, Seonwook Park, Heon Song et al.

Annotating tens or hundreds of tiny objects in a given image is laborious yet crucial for a multitude of Computer Vision tasks. Such imagery typically contains objects from various categories, yet the multi-class interactive annotation setting for the detection task has thus far been unexplored. To address these needs, we propose a novel interactive annotation method for multiple instances of tiny objects from multiple classes, based on a few point-based user inputs. Our approach, C3Det, relates the full image context with annotator inputs in a local and global manner via late-fusion and feature-correlation, respectively. We perform experiments on the Tiny-DOTA and LCell datasets using both two-stage and one-stage object detection architectures to verify the efficacy of our approach. Our approach outperforms existing approaches in interactive annotation, achieving higher mAP with fewer clicks. Furthermore, we validate the annotation efficiency of our approach in a user study where it is shown to be 2.85x faster and yield only 0.36x task load (NASA-TLX, lower is better) compared to manual annotation. The code is available at https://github.com/ChungYi347/Interactive-Multi-Class-Tiny-Object-Detection.

CVDec 9, 2022
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Mingu Kang, Heon Song, Seonwook Park et al.

Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements under diverse settings.

IVMar 23, 2023
OCELOT: Overlapped Cell on Tissue Dataset for Histopathology

Jeongun Ryu, Aaron Valero Puche, JaeWoong Shin et al.

Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the lack of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. OCELOT provides overlapping cell and tissue annotations on images acquired from multiple organs. Within this setting, we also propose multi-task learning approaches that benefit from learning both cell and tissue tasks simultaneously. When compared against a model trained only for the cell detection task, our proposed approaches improve cell detection performance on 3 datasets: proposed OCELOT, public TIGER, and internal CARP datasets. On the OCELOT test set in particular, we show up to 6.79 improvement in F1-score. We believe the contributions of this paper, including the release of the OCELOT dataset at https://lunit-io.github.io/research/publications/ocelot are a crucial starting point toward the important research direction of incorporating cell-tissue relationships in computation pathology.

CVSep 26, 2022
Improving Multi-fidelity Optimization with a Recurring Learning Rate for Hyperparameter Tuning

HyunJae Lee, Gihyeon Lee, Junhwan Kim et al.

Despite the evolution of Convolutional Neural Networks (CNNs), their performance is surprisingly dependent on the choice of hyperparameters. However, it remains challenging to efficiently explore large hyperparameter search space due to the long training times of modern CNNs. Multi-fidelity optimization enables the exploration of more hyperparameter configurations given budget by early termination of unpromising configurations. However, it often results in selecting a sub-optimal configuration as training with the high-performing configuration typically converges slowly in an early phase. In this paper, we propose Multi-fidelity Optimization with a Recurring Learning rate (MORL) which incorporates CNNs' optimization process into multi-fidelity optimization. MORL alleviates the problem of slow-starter and achieves a more precise low-fidelity approximation. Our comprehensive experiments on general image classification, transfer learning, and semi-supervised learning demonstrate the effectiveness of MORL over other multi-fidelity optimization methods such as Successive Halving Algorithm (SHA) and Hyperband. Furthermore, it achieves significant performance improvements over hand-tuned hyperparameter configuration within a practical budget.

CVJul 30, 2024
Generalizing AI-driven Assessment of Immunohistochemistry across Immunostains and Cancer Types: A Universal Immunohistochemistry Analyzer

Biagio Brattoli, Mohammad Mostafavi, Taebum Lee et al.

Despite advancements in methodologies, immunohistochemistry (IHC) remains the most utilized ancillary test for histopathologic and companion diagnostics in targeted therapies. However, objective IHC assessment poses challenges. Artificial intelligence (AI) has emerged as a potential solution, yet its development requires extensive training for each cancer and IHC type, limiting versatility. We developed a Universal IHC (UIHC) analyzer, an AI model for interpreting IHC images regardless of tumor or IHC types, using training datasets from various cancers stained for PD-L1 and/or HER2. This multi-cohort trained model outperforms conventional single-cohort models in interpreting unseen IHCs (Kappa score 0.578 vs. up to 0.509) and consistently shows superior performance across different positive staining cutoff values. Qualitative analysis reveals that UIHC effectively clusters patches based on expression levels. The UIHC model also quantitatively assesses c-MET expression with MET mutations, representing a significant advancement in AI application in the era of personalized medicine and accumulating novel biomarkers.

CVApr 25, 2023
Bayesian Optimization Meets Self-Distillation

HyunJae Lee, Heon Song, Hyeonsoo Lee et al.

Bayesian optimization (BO) has contributed greatly to improving model performance by suggesting promising hyperparameter configurations iteratively based on observations from multiple training trials. However, only partial knowledge (i.e., the measured performances of trained models and their hyperparameter configurations) from previous trials is transferred. On the other hand, Self-Distillation (SD) only transfers partial knowledge learned by the task model itself. To fully leverage the various knowledge gained from all training trials, we propose the BOSS framework, which combines BO and SD. BOSS suggests promising hyperparameter configurations through BO and carefully selects pre-trained models from previous trials for SD, which are otherwise abandoned in the conventional BO process. BOSS achieves significantly better performance than both BO and SD in a wide range of tasks including general image classification, learning with noisy labels, semi-supervised learning, and medical image analysis tasks.

CVJul 15, 2020Code
Learning Visual Context by Comparison

Minchul Kim, Jongchan Park, Seil Na et al.

Finding diseases from an X-ray image is an important yet highly challenging task. Current methods for solving this task exploit various characteristics of the chest X-ray image, but one of the most important characteristics is still missing: the necessity of comparison between related regions in an image. In this paper, we present Attend-and-Compare Module (ACM) for capturing the difference between an object of interest and its corresponding context. We show that explicit difference modeling can be very helpful in tasks that require direct comparison between locations from afar. This module can be plugged into existing deep learning models. For evaluation, we apply our module to three chest X-ray recognition tasks and COCO object detection & segmentation tasks and observe consistent improvements across tasks. The code is available at https://github.com/mk-minchul/attend-and-compare.

CVSep 11, 2025
OCELOT 2023: Cell Detection from Cell-Tissue Interaction Challenge

JaeWoong Shin, Jeongun Ryu, Aaron Valero Puche et al.

Pathologists routinely alternate between different magnifications when examining Whole-Slide Images, allowing them to evaluate both broad tissue morphology and intricate cellular details to form comprehensive diagnoses. However, existing deep learning-based cell detection models struggle to replicate these behaviors and learn the interdependent semantics between structures at different magnifications. A key barrier in the field is the lack of datasets with multi-scale overlapping cell and tissue annotations. The OCELOT 2023 challenge was initiated to gather insights from the community to validate the hypothesis that understanding cell and tissue (cell-tissue) interactions is crucial for achieving human-level performance, and to accelerate the research in this field. The challenge dataset includes overlapping cell detection and tissue segmentation annotations from six organs, comprising 673 pairs sourced from 306 The Cancer Genome Atlas (TCGA) Whole-Slide Images with hematoxylin and eosin staining, divided into training, validation, and test subsets. Participants presented models that significantly enhanced the understanding of cell-tissue relationships. Top entries achieved up to a 7.99 increase in F1-score on the test set compared to the baseline cell-only model that did not incorporate cell-tissue relationships. This is a substantial improvement in performance over traditional cell-only detection methods, demonstrating the need for incorporating multi-scale semantics into the models. This paper provides a comparative analysis of the methods used by participants, highlighting innovative strategies implemented in the OCELOT 2023 challenge.

IVAug 4, 2025
Identifying actionable driver mutations in lung cancer using an efficient Asymmetric Transformer Decoder

Biagio Brattoli, Jack Shi, Jongchan Park et al.

Identifying actionable driver mutations in non-small cell lung cancer (NSCLC) can impact treatment decisions and significantly improve patient outcomes. Despite guideline recommendations, broader adoption of genetic testing remains challenging due to limited availability and lengthy turnaround times. Machine Learning (ML) methods for Computational Pathology (CPath) offer a potential solution; however, research often focuses on only one or two common mutations, limiting the clinical value of these tools and the pool of patients who can benefit from them. This study evaluates various Multiple Instance Learning (MIL) techniques to detect six key actionable NSCLC driver mutations: ALK, BRAF, EGFR, ERBB2, KRAS, and MET ex14. Additionally, we introduce an Asymmetric Transformer Decoder model that employs queries and key-values of varying dimensions to maintain a low query dimensionality. This approach efficiently extracts information from patch embeddings and minimizes overfitting risks, proving highly adaptable to the MIL setting. Moreover, we present a method to directly utilize tissue type in the model, addressing a typical MIL limitation where either all regions or only some specific regions are analyzed, neglecting biological relevance. Our method outperforms top MIL models by an average of 3%, and over 4% when predicting rare mutations such as ERBB2 and BRAF, moving ML-based tests closer to being practical alternatives to standard genetic testing.

CVOct 25, 2019
Reducing Domain Gap by Reducing Style Bias

Hyeonseob Nam, HyunJae Lee, Jongchan Park et al.

Convolutional Neural Networks (CNNs) often fail to maintain their performance when they confront new test domains, which is known as the problem of domain shift. Recent studies suggest that one of the main causes of this problem is CNNs' strong inductive bias towards image styles (i.e. textures) which are sensitive to domain changes, rather than contents (i.e. shapes). Inspired by this, we propose to reduce the intrinsic style bias of CNNs to close the gap between domains. Our Style-Agnostic Networks (SagNets) disentangle style encodings from class categories to prevent style biased predictions and focus more on the contents. Extensive experiments show that our method effectively reduces the style bias and makes the model more robust under domain shift. It achieves remarkable performance improvements in a wide range of cross-domain tasks including domain generalization, unsupervised domain adaptation, and semi-supervised domain adaptation on multiple datasets.

CVSep 16, 2019
Visuomotor Understanding for Representation Learning of Driving Scenes

Seokju Lee, Junsik Kim, Tae-Hyun Oh et al.

Dashboard cameras capture a tremendous amount of driving scene video each day. These videos are purposefully coupled with vehicle sensing data, such as from the speedometer and inertial sensors, providing an additional sensing modality for free. In this work, we leverage the large-scale unlabeled yet naturally paired data for visual representation learning in the driving scenario. A representation is learned in an end-to-end self-supervised framework for predicting dense optical flow from a single frame with paired sensing data. We postulate that success on this task requires the network to learn semantic and geometric knowledge in the ego-centric view. For example, forecasting a future view to be seen from a moving vehicle requires an understanding of scene depth, scale, and movement of objects. We demonstrate that our learned representation can benefit other tasks that require detailed scene understanding and outperforms competing unsupervised representations on semantic segmentation.

CVJun 7, 2019
PseudoEdgeNet: Nuclei Segmentation only with Point Annotations

Inwan Yoo, Donggeun Yoo, Kyunghyun Paeng

Nuclei segmentation is one of the important tasks for whole slide image analysis in digital pathology. With the drastic advance of deep learning, recent deep networks have demonstrated successful performance of the nuclei segmentation task. However, a major bottleneck to achieving good performance is the cost for annotation. A large network requires a large number of segmentation masks, and this annotation task is given to pathologists, not the public. In this paper, we propose a weakly supervised nuclei segmentation method, which requires only point annotations for training. This method can scale to large training set as marking a point of a nucleus is much cheaper than the fine segmentation mask. To this end, we introduce a novel auxiliary network, called PseudoEdgeNet, which guides the segmentation network to recognize nuclei edges even without edge annotations. We evaluate our method with two public datasets, and the results demonstrate that the method consistently outperforms other weakly supervised methods.

CVMay 9, 2019
Learning Loss for Active Learning

Donggeun Yoo, In So Kweon

The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named "loss prediction module," to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks.

CVApr 12, 2018
Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning

Jongchan Park, Joon-Young Lee, Donggeun Yoo et al.

Learning-based color enhancement approaches typically learn to map from input images to retouched images. Most of existing methods require expensive pairs of input-retouched images or produce results in a non-interpretable way. In this paper, we present a deep reinforcement learning (DRL) based method for color enhancement to explicitly model the step-wise nature of human retouching process. We cast a color enhancement process as a Markov Decision Process where actions are defined as global color adjustment operations. Then we train our agent to learn the optimal global enhancement sequence of the actions. In addition, we present a 'distort-and-recover' training scheme which only requires high-quality reference images for training instead of input and retouched image pairs. Given high-quality reference images, we distort the images' color distribution and form distorted-reference image pairs for training. Through extensive experiments, we show that our method produces decent enhancement results and our DRL approach is more suitable for the 'distort-and-recover' training scheme than previous supervised approaches. Supplementary material and code are available at https://sites.google.com/view/distort-and-recover/

CVFeb 6, 2018
Learning Image Representations by Completing Damaged Jigsaw Puzzles

Dahun Kim, Donghyeon Cho, Donggeun Yoo et al.

In this paper, we explore methods of complicating self-supervised tasks for representation learning. That is, we do severe damage to data and encourage a network to recover them. First, we complicate each of three powerful self-supervised task candidates: jigsaw puzzle, inpainting, and colorization. In addition, we introduce a novel complicated self-supervised task called "Completing damaged jigsaw puzzles" which is puzzles with one piece missing and the other pieces without color. We train a convolutional neural network not only to solve the puzzles, but also generate the missing content and colorize the puzzles. The recovery of the aforementioned damage pushes the network to obtain robust and general-purpose representations. We demonstrate that complicating the self-supervised tasks improves their original versions and that our final task learns more robust and transferable representations compared to the previous methods, as well as the simple combination of our candidate tasks. Our approach achieves state-of-the-art performance in transfer learning on PASCAL classification and semantic segmentation.

CVAug 7, 2017
Two-Phase Learning for Weakly Supervised Object Localization

Dahun Kim, Donghyeon Cho, Donggeun Yoo et al.

Weakly supervised semantic segmentation and localiza- tion have a problem of focusing only on the most important parts of an image since they use only image-level annota- tions. In this paper, we solve this problem fundamentally via two-phase learning. Our networks are trained in two steps. In the first step, a conventional fully convolutional network (FCN) is trained to find the most discriminative parts of an image. In the second step, the activations on the most salient parts are suppressed by inference conditional feedback, and then the second learning is performed to find the area of the next most important parts. By combining the activations of both phases, the entire portion of the tar- get object can be captured. Our proposed training scheme is novel and can be utilized in well-designed techniques for weakly supervised semantic segmentation, salient region detection, and object location prediction. Detailed experi- ments demonstrate the effectiveness of our two-phase learn- ing in each task.

CVDec 20, 2016
Action-Driven Object Detection with Top-Down Visual Attentions

Donggeun Yoo, Sunggyun Park, Kyunghyun Paeng et al.

A dominant paradigm for deep learning based object detection relies on a "bottom-up" approach using "passive" scoring of class agnostic proposals. These approaches are efficient but lack of holistic analysis of scene-level context. In this paper, we present an "action-driven" detection mechanism using our "top-down" visual attention model. We localize an object by taking sequential actions that the attention model provides. The attention model conditioned with an image region provides required actions to get closer toward a target object. An action at each time step is weak itself but an ensemble of the sequential actions makes a bounding-box accurately converge to a target object boundary. This attention model we call AttentionNet is composed of a convolutional neural network. During our whole detection procedure, we only utilize the actions from a single AttentionNet without any modules for object proposals nor post bounding-box regression. We evaluate our top-down detection mechanism over the PASCAL VOC series and ILSVRC CLS-LOC dataset, and achieve state-of-the-art performances compared to the major bottom-up detection methods. In particular, our detection mechanism shows a strong advantage in elaborate localization by outperforming Faster R-CNN with a margin of +7.1% over PASCAL VOC 2007 when we increase the IoU threshold for positive detection to 0.7.

CVMar 24, 2016
Pixel-Level Domain Transfer

Donggeun Yoo, Namil Kim, Sunggyun Park et al.

We present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator as in Generative Adversarial Nets, but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.

CVJun 25, 2015
AttentionNet: Aggregating Weak Directions for Accurate Object Detection

Donggeun Yoo, Sunggyun Park, Joon-Young Lee et al.

We present a novel detection method using a deep convolutional neural network (CNN), named AttentionNet. We cast an object detection problem as an iterative classification problem, which is the most suitable form of a CNN. AttentionNet provides quantized weak directions pointing a target object and the ensemble of iterative predictions from AttentionNet converges to an accurate object boundary box. Since AttentionNet is a unified network for object detection, it detects objects without any separated models from the object proposal to the post bounding-box regression. We evaluate AttentionNet by a human detection task and achieve the state-of-the-art performance of 65% (AP) on PASCAL VOC 2007/2012 with an 8-layered architecture only.

CVDec 4, 2014
Fisher Kernel for Deep Neural Activations

Donggeun Yoo, Sunggyun Park, Joon-Young Lee et al.

Compared to image representation based on low-level local descriptors, deep neural activations of Convolutional Neural Networks (CNNs) are richer in mid-level representation, but poorer in geometric invariance properties. In this paper, we present a straightforward framework for better image representation by combining the two approaches. To take advantages of both representations, we propose an efficient method to extract a fair amount of multi-scale dense local activations from a pre-trained CNN. We then aggregate the activations by Fisher kernel framework, which has been modified with a simple scale-wise normalization essential to make it suitable for CNN activations. Replacing the direct use of a single activation vector with our representation demonstrates significant performance improvements: +17.76 (Acc.) on MIT Indoor 67 and +7.18 (mAP) on PASCAL VOC 2007. The results suggest that our proposal can be used as a primary image representation for better performances in visual recognition tasks.