CLFeb 1

Minimizing Mismatch Risk: A Prototype-Based Routing Framework for Zero-shot LLM-generated Text Detection

Ke Sun, Guangsheng Bao, Han Cui, Yue Zhang

arXiv:2602.01240v1

Originality Incremental advance

AI Analysis

This work addresses robust detection of AI-generated text for security and content moderation, though it is incremental as it builds on existing zero-shot methods.

The paper tackles the problem of detecting LLM-generated text by addressing performance variability due to mismatched surrogate models, proposing a routing framework that selects the best surrogate for each input, which improves detection across benchmarks like EvoBench and MAGE.

Zero-shot methods detect LLM-generated text by computing statistical signatures using a surrogate model. Existing approaches typically employ a fixed surrogate for all inputs regardless of the unknown source. We systematically examine this design and find that detection performance varies substantially depending on surrogate-source alignment. We observe that while no single surrogate achieves optimal performance universally, a well-matched surrogate typically exists within a diverse pool for any given input. This finding transforms robust detection into a routing problem: selecting the most appropriate surrogate for each input. We propose DetectRouter, a prototype-based framework that learns text-detector affinity through two-stage training. The first stage constructs discriminative prototypes from white-box models; the second generalizes to black-box sources by aligning geometric distances with observed detection scores. Experiments on EvoBench and MAGE benchmarks demonstrate consistent improvements across multiple detection criteria and model families.

View on arXiv PDF

Similar