CL AI LGFeb 24

SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing

Yifei Xu, Guilherme Potje, Shivam Shandilya, Tiancheng Yuan, Leonardo de Oliveira Nunes, Rakshanda Agarwal, Saeid Asgari, Adam Atkinson, Emre Kıcıman, Songwu Lu, Ranveer Chandra, Tusher Chakraborty

arXiv:2602.20751v12.14 citationsh-index: 62

Originality Incremental advance

AI Analysis

This addresses the challenge of creating aligned and robust rewards for RL post-training in open-ended generation, which is incremental by building on existing rubric-based methods.

The paper tackles the problem of scaling rubric construction for reward design in open-ended generation by introducing SibylSense, an inference-time learning approach that adapts a rubric generator via memory tuning and adversarial probing, resulting in more discriminative rubrics and improved RL performance over baselines.

Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-training. Rubrics provide structured, interpretable supervision, but scaling rubric construction is difficult: expert rubrics are costly, prompted rubrics are often superficial or inconsistent, and fixed-pool discriminative rubrics can saturate and drift, enabling reward hacking. We present SibylSense, an inference-time learning approach that adapts a frozen rubric generator through a tunable memory bank of validated rubric items. Memory is updated via verifier-based item rewards measured by reference-candidate answer discriminative gaps from a handful of examples. SibylSense alternates memory tuning with a rubric-adversarial policy update that produces rubric-satisfying candidate answers, shrinking discriminative gaps and driving the rubric generator to capture new quality dimensions. Experiments on two open-ended tasks show that SibylSense yields more discriminative rubrics and improves downstream RL performance over static and non-adaptive baselines.

View on arXiv PDF

Similar