CVAIFeb 26

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

arXiv:2602.22740v1h-index: 9
Originality Incremental advance
AI Analysis

This paper provides an incremental improvement for researchers working on referring image segmentation.

This paper addresses Referring Image Segmentation (RIS) by introducing Alignment-Aware Masked Learning (AML), a training strategy that estimates pixel-level vision-language alignment and filters out poorly aligned regions. This method achieves state-of-the-art performance on RefCOCO datasets and improves robustness.

Referring Image Segmentation (RIS) aims to segment an object in an image identified by a natural language expression. The paper introduces Alignment-Aware Masked Learning (AML), a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues. This approach results in state-of-the-art performance on RefCOCO datasets and also enhances robustness to diverse descriptions and scenarios

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes