SE AI HCOct 20, 2025

CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation

arXiv:2510.18895v11 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses code generation errors for developers using AI assistants, offering an incremental improvement over existing RLHF methods.

The paper tackles the problem of hallucinated code and slow self-correction in large language models for code generation by introducing CosmoCore, a neuroscience-inspired reinforcement learning architecture that uses affective signals to prioritize replay of buggy outputs, resulting in a 48% reduction in hallucinated code and 45% acceleration in self-correction.

We introduce CosmoCore, a neuroscience-inspired reinforcement learning (RL) architecture that integrates affective signals to enhance code generation in large language models (LLMs). Motivated by human and animal learning where embarrassment from mistakes drives rapid correction, as observed in training a puppy to avoid repeating errors after a single scolding CosmoCore tags code generation trajectories with valence and surprise using a lightweight multi-layer perceptron (MLP). High-negative valence (cringe) episodes, such as buggy code outputs, are prioritized in a Dream Queue for five-fold replay during off-policy updates, while low-surprise successes are pruned to prevent overconfidence and buffer bloat. Evaluated on code generation benchmarks like HumanEval and BigCodeBench, alongside simulations with a custom data pipeline environment, CosmoCore reduces hallucinated code (e.g., syntax errors or logical bugs) by 48\% and accelerates self-correction by 45\%. Local experiments using Hugging Face models in a PySpark environment validate these gains, with code snippets provided for replication. Ablations confirm valence tagging boosts curiosity in exploration, and pruning mitigates inefficiency. This framework extends RL from human feedback (RLHF) for more emotionally aware code assistants, with applications in IDEs and data pipelines. Code and the custom mini-world simulation are released.

View on arXiv PDF

Similar