Fake-HR1: Rethinking Reasoning of Vision Language Model for Synthetic Image Detection
This work addresses efficiency issues in synthetic image detection for applications requiring real-time processing, though it is incremental as it builds on existing reasoning methods.
The paper tackles the problem of excessive resource overhead in synthetic image detection by proposing Fake-HR1, a model that adaptively decides when to use reasoning, resulting in improved detection performance and response efficiency compared to existing LLMs.
Recent studies have demonstrated that incorporating Chain-of-Thought (CoT) reasoning into the detection process can enhance a model's ability to detect synthetic images. However, excessively lengthy reasoning incurs substantial resource overhead, including token consumption and latency, which is particularly redundant when handling obviously generated forgeries. To address this issue, we propose Fake-HR1, a large-scale hybrid-reasoning model that, to the best of our knowledge, is the first to adaptively determine whether reasoning is necessary based on the characteristics of the generative detection task. To achieve this, we design a two-stage training framework: we first perform Hybrid Fine-Tuning (HFT) for cold-start initialization, followed by online reinforcement learning with Hybrid-Reasoning Grouped Policy Optimization (HGRPO) to implicitly learn when to select an appropriate reasoning mode. Experimental results show that Fake-HR1 adaptively performs reasoning across different types of queries, surpassing existing LLMs in both reasoning ability and generative detection performance, while significantly improving response efficiency.