CVApr 11, 2025

CMIP-CIL: A Cross-Modal Benchmark for Image-Point Class Incremental Learning

arXiv:2504.08422v1h-index: 5Has Code
Originality Incremental advance
AI Analysis

This addresses incremental learning for robots in dynamic environments where they must continually learn from 2D images and 3D points, though it appears incremental as it builds on existing contrastive learning and prototype-based techniques.

The paper tackles cross-modal catastrophic forgetting in image-point class incremental learning for 3D-points-vision robots, proposing a benchmark CMIP-CIL and a method that achieves state-of-the-art results, outperforming baseline methods by a large margin.

Image-point class incremental learning helps the 3D-points-vision robots continually learn category knowledge from 2D images, improving their perceptual capability in dynamic environments. However, some incremental learning methods address unimodal forgetting but fail in cross-modal cases, while others handle modal differences within training/testing datasets but assume no modal gaps between them. We first explore this cross-modal task, proposing a benchmark CMIP-CIL and relieving the cross-modal catastrophic forgetting problem. It employs masked point clouds and rendered multi-view images within a contrastive learning framework in pre-training, empowering the vision model with the generalizations of image-point correspondence. In the incremental stage, by freezing the backbone and promoting object representations close to their respective prototypes, the model effectively retains and generalizes knowledge across previously seen categories while continuing to learn new ones. We conduct comprehensive experiments on the benchmark datasets. Experiments prove that our method achieves state-of-the-art results, outperforming the baseline methods by a large margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes