CVApr 1, 2025

POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation

arXiv:2504.00640v123 citationsh-index: 18CVPR
Originality Highly original
AI Analysis

This addresses segmentation quality issues in vision-language models for researchers and practitioners, representing a strong incremental improvement.

The paper tackles imprecise segmentation and hallucinations in LVLM-based reasoning segmentation by introducing POPEN, a framework with preference-based optimization and ensemble methods that achieves state-of-the-art performance with minimal hallucination and highest segmentation accuracy compared to methods like LISA and PixelLM.

Existing LVLM-based reasoning segmentation methods often suffer from imprecise segmentation results and hallucinations in their text responses. This paper introduces POPEN, a novel framework designed to address these issues and achieve improved results. POPEN includes a preference-based optimization method to finetune the LVLM, aligning it more closely with human preferences and thereby generating better text responses and segmentation results. Additionally, POPEN introduces a preference-based ensemble method for inference, which integrates multiple outputs from the LVLM using a preference-score-based attention mechanism for refinement. To better adapt to the segmentation task, we incorporate several task-specific designs in our POPEN framework, including a new approach for collecting segmentation preference data with a curriculum learning mechanism, and a novel preference optimization loss to refine the segmentation capability of the LVLM. Experiments demonstrate that our method achieves state-of-the-art performance in reasoning segmentation, exhibiting minimal hallucination in text responses and the highest segmentation accuracy compared to previous advanced methods like LISA and PixelLM. Project page is https://lanyunzhu.site/POPEN/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes