Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning
This addresses a bottleneck in transformer-based reasoning for AI researchers, though it appears incremental as it builds on existing chain-of-model frameworks.
The paper tackles the problem of long-range dependency loss in chain-of-model reasoning by proposing Graph-of-Causal Evolution (GoCE), which maps token representations to a sparse causal adjacency matrix and uses causal-masked attention and causal-MoE to permeate constraints, resulting in improved capture of long-range causal dependencies and self-evolution ability compared to baseline LLMs on datasets like CLUTRR, CLADDER, EX-FEVER, and CausalQA.
In view of the problem that each subchain in the chain-of-model (CoM) relies only on the information of the previous subchain and may lose long-range dependencies due to the causal mask blocking the global context flow between multi-level subchains, this work proposes a graph of causal evolution (GoCE). Its core principle is to map the implicit token representation into a differentiable and sparse causal adjacency matrix, then permeate causal constraints through each layer of calculation using causal-masked attention and causal-MoE. By combining intervention consistency loss test and self-evolution gate, the dynamic balance between causal structure learning and adaptive updating of transformer architecture is realized. The researcher built experimental environments in sandboxes built with Claude Sonnet 4, o4-mini-high, and DeepSeek R1 respectively with the transformer variant architecture introduced in GoCE. It is evaluated on publicly available datasets including CLUTRR, CLADDER, EX-FEVER, and CausalQA and compared with the baseline LLMs. The finding proves that GoCE strengthens the transformer's ability to capture long-range causal dependencies, while the ability to self-evolve is improved. It not only surpasses the design of CoM in terms of design principles, but also provides experience for future research on causal learning and continuous adaptive improvement.