CLJun 18, 2025

CC-LEARN: Cohort-based Consistency Learning

Xiao Ye, Shaswat Shrivastava, Zhaonan Li, Jacob Dineen, Shijie Lu, Avneet Ahuja, Ming Shen, Zhikun Xu, Ben Zhou

arXiv:2506.15662v16.72 citationsh-index: 5

Originality Highly original

AI Analysis

This addresses the issue of robust reasoning for users of large language models, representing an incremental improvement through a novel method for a known bottleneck.

The paper tackles the problem of inconsistent reasoning in large language models by introducing Cohort-based Consistency Learning (CC-Learn), a reinforcement learning framework that improves reliability by training on cohorts of similar questions, resulting in boosted accuracy and reasoning stability on benchmarks like ARC-Challenge and StrategyQA.

Large language models excel at many tasks but still struggle with consistent, robust reasoning. We introduce Cohort-based Consistency Learning (CC-Learn), a reinforcement learning framework that improves the reliability of LLM reasoning by training on cohorts of similar questions derived from shared programmatic abstractions. To enforce cohort-level consistency, we define a composite objective combining cohort accuracy, a retrieval bonus for effective problem decomposition, and a rejection penalty for trivial or invalid lookups that reinforcement learning can directly optimize, unlike supervised fine-tuning. Optimizing this reward guides the model to adopt uniform reasoning patterns across all cohort members. Experiments on challenging reasoning benchmarks (including ARC-Challenge and StrategyQA) show that CC-Learn boosts both accuracy and reasoning stability over pretrained and SFT baselines. These results demonstrate that cohort-level RL effectively enhances reasoning consistency in LLMs.

View on arXiv PDF

Similar