CVSep 10, 2021Code
LibFewShot: A Comprehensive Library for Few-shot LearningWenbin Li, Ziyi, Wang et al.
Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years. Some recent studies implicitly show that many generic techniques or ``tricks'', such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method. Moreover, different works may employ different software platforms, backbone architectures and input image sizes, making fair comparisons difficult and practitioners struggle with reproducibility. To address these situations, we propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing eighteen state-of-the-art few-shot learning methods in a unified framework with the same single codebase in PyTorch. Furthermore, based on LibFewShot, we provide comprehensive evaluations on multiple benchmarks with various backbone architectures to evaluate common pitfalls and effects of different training tricks. In addition, with respect to the recent doubts on the necessity of meta- or episodic-training mechanism, our evaluation results confirm that such a mechanism is still necessary especially when combined with pre-training. We hope our work can not only lower the barriers for beginners to enter the area of few-shot learning but also elucidate the effects of nontrivial tricks to facilitate intrinsic research on few-shot learning. The source code is available from https://github.com/RL-VIG/LibFewShot.
CVSep 2, 2025
ContextFusion and Bootstrap: An Effective Approach to Improve Slot Attention-Based Object-Centric LearningPinzhuo Tian, Shengjie Yang, Hang Yu et al.
A key human ability is to decompose a scene into distinct objects and use their relationships to understand the environment. Object-centric learning aims to mimic this process in an unsupervised manner. Recently, the slot attention-based framework has emerged as a leading approach in this area and has been widely used in various downstream tasks. However, existing slot attention methods face two key limitations: (1) a lack of high-level semantic information. In current methods, image areas are assigned to slots based on low-level features such as color and texture. This makes the model overly sensitive to low-level features and limits its understanding of object contours, shapes, or other semantic characteristics. (2) The inability to fine-tune the encoder. Current methods require a stable feature space throughout training to enable reconstruction from slots, which restricts the flexibility needed for effective object-centric learning. To address these limitations, we propose a novel ContextFusion stage and a Bootstrap Branch, both of which can be seamlessly integrated into existing slot attention models. In the ContextFusion stage, we exploit semantic information from the foreground and background, incorporating an auxiliary indicator that provides additional contextual cues about them to enrich the semantic content beyond low-level features. In the Bootstrap Branch, we decouple feature adaptation from the original reconstruction phase and introduce a bootstrap strategy to train a feature-adaptive mechanism, allowing for more flexible adaptation. Experimental results show that our method significantly improves the performance of different SOTA slot attention models on both simulated and real-world datasets.
LGJul 23, 2021
Improving the Generalization of Meta-learning on Unseen Domains via Adversarial ShiftPinzhuo Tian, Yao Gao
Meta-learning provides a promising way for learning to efficiently learn and achieves great success in many applications. However, most meta-learning literature focuses on dealing with tasks from a same domain, making it brittle to generalize to tasks from the other unseen domains. In this work, we address this problem by simulating tasks from the other unseen domains to improve the generalization and robustness of meta-learning method. Specifically, we propose a model-agnostic shift layer to learn how to simulate the domain shift and generate pseudo tasks, and develop a new adversarial learning-to-learn mechanism to train it. Based on the pseudo tasks, the meta-learning model can learn cross-domain meta-knowledge, which can generalize well on unseen domains. We conduct extensive experiments under the domain generalization setting. Experimental results demonstrate that the proposed shift layer is applicable to various meta-learning frameworks. Moreover, our method also leads to state-of-the-art performance on different cross-domain few-shot classification benchmarks and produces good results on cross-domain few-shot regression.
CVNov 23, 2019
Differentiable Meta-learning Model for Few-shot Semantic SegmentationPinzhuo Tian, Zhangkai Wu, Lei Qi et al.
To address the annotation scarcity issue in some cases of semantic segmentation, there have been a few attempts to develop the segmentation model in the few-shot learning paradigm. However, most existing methods only focus on the traditional 1-way segmentation setting (i.e., one image only contains a single object). This is far away from practical semantic segmentation tasks where the K-way setting (K>1) is usually required by performing the accurate multi-object segmentation. To deal with this issue, we formulate the few-shot semantic segmentation task as a learning-based pixel classification problem and propose a novel framework called MetaSegNet based on meta-learning. In MetaSegNet, an architecture of embedding module consisting of the global and local feature branches is developed to extract the appropriate meta-knowledge for the few-shot segmentation. Moreover, we incorporate a linear model into MetaSegNet as a base learner to directly predict the label of each pixel for the multi-object segmentation. Furthermore, our MetaSegNet can be trained by the episodic training mechanism in an end-to-end manner from scratch. Experiments on two popular semantic segmentation datasets, i.e., PASCAL VOC and COCO, reveal the effectiveness of the proposed MetaSegNet in the K-way few-shot semantic segmentation task.