CVNov 11, 2025

CSF-Net: Context-Semantic Fusion Network for Large Mask Inpainting

arXiv:2511.07987v13.6h-index: 1Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of generating plausible content in heavily occluded images for computer vision applications, representing an incremental improvement through integration with existing models.

The paper tackles large-mask image inpainting by proposing CSF-Net, a semantic-guided framework that uses pretrained amodal completion to generate structure-aware candidates and fuses them with contextual features via transformers, resulting in reduced object hallucination and enhanced visual realism on Places365 and COCOA datasets.

In this paper, we propose a semantic-guided framework to address the challenging problem of large-mask image inpainting, where essential visual content is missing and contextual cues are limited. To compensate for the limited context, we leverage a pretrained Amodal Completion (AC) model to generate structure-aware candidates that serve as semantic priors for the missing regions. We introduce Context-Semantic Fusion Network (CSF-Net), a transformer-based fusion framework that fuses these candidates with contextual features to produce a semantic guidance image for image inpainting. This guidance improves inpainting quality by promoting structural accuracy and semantic consistency. CSF-Net can be seamlessly integrated into existing inpainting models without architectural changes and consistently enhances performance across diverse masking conditions. Extensive experiments on the Places365 and COCOA datasets demonstrate that CSF-Net effectively reduces object hallucination while enhancing visual realism and semantic alignment. The code for CSF-Net is available at https://github.com/chaeyeonheo/CSF-Net.

View on arXiv PDF Code

Similar