CLJun 8, 2025

History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM

Andrew Kiruluta, Andreas Lemos, Priscilla Burity

arXiv:2506.11108v1h-index: 3

Originality Incremental advance

AI Analysis

This work addresses multi-turn and chain-of-thought tasks for AI systems, but it is incremental as it builds upon an existing single-turn approach.

The paper tackles the problem of multi-turn dialogue and chain-of-thought reasoning by extending a self-supervised reinforcement framework to use vLLM for capturing cross-attention weights, with results including practical trade-offs like an entropy-based clamping mechanism.

We present CAGSR-vLLM-MTC, an extension of our Self-Supervised Cross-Attention-Guided Reinforcement (CAGSR) framework, now implemented on the high-performance vLLM runtime, to address both multi-turn dialogue and chain-of-thought reasoning. Building upon our original single-turn approach, we first instrumented vLLM's C++/CUDA kernels to asynchronously capture per-layer, per-head cross-attention weights during generation. We then generalized our self-supervised reward function to accumulate attention signals over entire conversation histories and intermediate chain-of-thought steps. We discuss practical trade-offs, including an entropy-based clamping mechanism to prevent attention collapse on early context, and outline future directions for multi-party dialogues and hierarchical reasoning.

View on arXiv PDF

Similar