CVAINov 28, 2025

Leveraging Textual Compositional Reasoning for Robust Change Captioning

arXiv:2511.22903v1
Originality Incremental advance
AI Analysis

This addresses the challenge of robustly describing image changes for applications like surveillance or autonomous systems, though it is incremental by building on existing methods with textual enhancements.

The paper tackles the problem of change captioning by integrating textual cues from Vision Language Models to capture subtle changes that visual features alone miss, achieving state-of-the-art results on benchmark datasets with a 5.2% improvement in CIDEr score.

Change captioning aims to describe changes between a pair of images. However, existing works rely on visual features alone, which often fail to capture subtle but meaningful changes because they lack the ability to represent explicitly structured information such as object relationships and compositional semantics. To alleviate this, we present CORTEX (COmpositional Reasoning-aware TEXt-guided), a novel framework that integrates complementary textual cues to enhance change understanding. In addition to capturing cues from pixel-level differences, CORTEX utilizes scene-level textual knowledge provided by Vision Language Models (VLMs) to extract richer image text signals that reveal underlying compositional reasoning. CORTEX consists of three key modules: (i) an Image-level Change Detector that identifies low-level visual differences between paired images, (ii) a Reasoning-aware Text Extraction (RTE) module that use VLMs to generate compositional reasoning descriptions implicit in visual features, and (iii) an Image-Text Dual Alignment (ITDA) module that aligns visual and textual features for fine-grained relational reasoning. This enables CORTEX to reason over visual and textual features and capture changes that are otherwise ambiguous in visual features alone.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes