CVAug 15, 2022Code
Hierarchical Attention Network for Few-Shot Object Detection via Meta-Contrastive LearningDongwoo Park, Jong-Min Lee · berkeley
Few-shot object detection (FSOD) aims to classify and detect few images of novel categories. Existing meta-learning methods insufficiently exploit features between support and query images owing to structural limitations. We propose a hierarchical attention network with sequentially large receptive fields to fully exploit the query and support images. In addition, meta-learning does not distinguish the categories well because it determines whether the support and query images match. In other words, metric-based learning for classification is ineffective because it does not work directly. Thus, we propose a contrastive learning method called meta-contrastive learning, which directly helps achieve the purpose of the meta-learning strategy. Finally, we establish a new state-of-the-art network, by realizing significant margins. Our method brings 2.3, 1.0, 1.3, 3.4 and 2.4% AP improvements for 1-30 shots object detection on COCO dataset. Our code is available at: https://github.com/infinity7428/hANMCL
CVApr 17, 2023Code
Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive ArchitectureTaeho Kim, Jong-Min Lee
Most invariance-based self-supervised methods rely on single object-centric images (e.g., ImageNet images) for pretraining, learning features that invariant to geometric transformation. However, when images are not object-centric, the semantics of the image can be significantly altered due to cropping. Furthermore, as the model becomes insensitive to geometric transformations, it may struggle to capture location information. For this reason, we propose a Geometric Transformation Sensitive Architecture designed to be sensitive to geometric transformations, specifically focusing on four-fold rotation, random crop, and multi-crop. Our method encourages the student to be sensitive by predicting rotation and using targets that vary with those transformations through pooling and rotating the teacher feature map. Additionally, we use patch correspondence loss to encourage correspondence between patches with similar features. This approach allows us to capture long-term dependencies in a more appropriate way than capturing long-term dependencies by encouraging local-to-global correspondence, which occurs when learning to be insensitive to multi-crop. Our approach demonstrates improved performance when using non-object-centric images as pretraining data compared to other methods that train the model to be insensitive to geometric transformation. We surpass DINO[Caron et al.[2021b]] baseline in tasks including image classification, semantic segmentation, detection, and instance segmentation with improvements of 4.9 $Top-1 Acc$, 3.3 $mIoU$, 3.4 $AP^b$, and 2.7 $AP^m$. Code and pretrained models are publicly available at: https://github.com/bok3948/GTSA
IVJan 24, 2025
Guided Neural Schrödinger bridge for Brain MR image synthesis with Limited DataHanyeol Yang, Sunggyu Kim, Mi Kyung Kim et al.
Multi-modal brain MRI provides essential complementary information for clinical diagnosis. However, acquiring all modalities in practice is often constrained by time and cost. To address this, various methods have been proposed to generate missing modalities from available ones. Traditional approaches can be broadly categorized into two main types: paired and unpaired methods. While paired methods for synthesizing missing modalities achieve high accuracy, obtaining large-scale paired datasets is typically impractical. In contrast, unpaired methods, though scalable, often fail to preserve critical anatomical features, such as lesions. In this paper, we propose Fully Guided Schrödinger Bridge (FGSB), a novel framework designed to overcome these limitations by enabling high-fidelity generation with extremely limited paired data. Furthermore, when provided with lesion-specific information such as expert annotations, segmentation tools, or simple intensity thresholds for critical regions, FGSB can generate missing modalities while preserving these significant lesion with reduced data requirements. Our model comprises two stages: 1) Generation Phase: Iteratively refines synthetic images using paired target image and Gaussian noise. Training Phase: Learns optimal transformation pathways from source to target modality by mapping all intermediate states, ensuring consistent and high-fidelity synthesis. Experimental results across multiple datasets demonstrate that FGSB achieved performance comparable to large-data-trained models, while using only two subjects. Incorporating lesion-specific priors further improves the preservation of clinical features.
LGFeb 8, 2022
Boosting Graph Neural Networks by Injecting Pooling in Message PassingHyeokjin Kwon, Jong-Min Lee
There has been tremendous success in the field of graph neural networks (GNNs) as a result of the development of the message-passing (MP) layer, which updates the representation of a node by combining it with its neighbors to address variable-size and unordered graphs. Despite the fruitful progress of MP GNNs, their performance can suffer from over-smoothing, when node representations become too similar and even indistinguishable from one another. Furthermore, it has been reported that intrinsic graph structures are smoothed out as the GNN layer increases. Inspired by the edge-preserving bilateral filters used in image processing, we propose a new, adaptable, and powerful MP framework to prevent over-smoothing. Our bilateral-MP estimates a pairwise modular gradient by utilizing the class information of nodes, and further preserves the global graph structure by using the gradient when the aggregating function is applied. Our proposed scheme can be generalized to all ordinary MP GNNs. Experiments on five medium-size benchmark datasets using four state-of-the-art MP GNNs indicate that the bilateral-MP improves performance by alleviating over-smoothing. By inspecting quantitative measurements, we additionally validate the effectiveness of the proposed mechanism in preventing the over-smoothing issue.