CVMar 4, 2024

Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection

arXiv:2403.01680v318 citationsh-index: 10Has CodeNIPS
Originality Highly original
AI Analysis

This addresses the challenge of maintaining model flexibility and performance across diverse domains without retraining from scratch, which is incremental but important for real-world applications.

The paper tackles the problem of incrementally adapting pre-trained vision-language object detection models to specialized domains while preserving their zero-shot generalization capabilities, achieving improvements of 13.91 and 8.74 AP over baselines on the ODinW-13 dataset.

This paper presents Incremental Vision-Language Object Detection (IVLOD), a novel learning task designed to incrementally adapt pre-trained Vision-Language Object Detection Models (VLODMs) to various specialized domains, while simultaneously preserving their zero-shot generalization capabilities for the generalized domain. To address this new challenge, we present the Zero-interference Reparameterizable Adaptation (ZiRa), a novel method that introduces Zero-interference Loss and reparameterization techniques to tackle IVLOD without incurring additional inference costs or a significant increase in memory usage. Comprehensive experiments on COCO and ODinW-13 datasets demonstrate that ZiRa effectively safeguards the zero-shot generalization ability of VLODMs while continuously adapting to new tasks. Specifically, after training on ODinW-13 datasets, ZiRa exhibits superior performance compared to CL-DETR and iDETR, boosting zero-shot generalizability by substantial 13.91 and 8.74 AP, respectively.Our code is available at https://github.com/JarintotionDin/ZiRaGroundingDINO.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes