CLAISep 26, 2025

SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification

arXiv:2510.02329v14 citationsh-index: 16
Originality Incremental advance
AI Analysis

This provides a broadly applicable solution for faster LLM inference across diverse NLP tasks, though it appears incremental as it builds on existing judge decoding methods.

The paper tackles the problem of limited generalizability in judge decoding for speculative LLM inference by proposing SelfJudge, which trains judge verifiers via self-supervision of the target model to measure semantic preservation across diverse NLP tasks, achieving superior inference-accuracy trade-offs compared to baselines.

Speculative decoding accelerates LLM inference by verifying candidate tokens from a draft model against a larger target model. Recent judge decoding boosts this process by relaxing verification criteria by accepting draft tokens that may exhibit minor discrepancies from target model output, but existing methods are restricted by their reliance on human annotations or tasks with verifiable ground truths, limiting generalizability across diverse NLP tasks. We propose SelfJudge, which trains judge verifiers via self-supervision of the target model. Our method measures semantic preservation by assessing whether token-substituted responses preserve the meaning of original responses, enabling automatic verifier training across diverse NLP tasks. Our experiments show SelfJudge achieves superior inference-accuracy trade-offs than judge decoding baselines, offering a broadly applicable solution for faster LLM inference.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes