Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning
This addresses the problem of verification, debugging, and control in high-stakes settings for users of LLMs, representing an incremental advancement by integrating human oversight into existing CoT methods.
The paper tackles the opacity of chain-of-thought reasoning in large language models by introducing Vis-CoT, a human-in-the-loop framework that converts CoT text into an interactive graph for visualization and intervention, resulting in up to 24 percentage point accuracy improvements on GSM8K and StrategyQA benchmarks.
Large language models (LLMs) show strong reasoning via chain-of-thought (CoT) prompting, but the process is opaque, which makes verification, debugging, and control difficult in high-stakes settings. We present Vis-CoT, a human-in-the-loop framework that converts linear CoT text into an interactive reasoning graph. Users can visualize the logical flow, identify flawed steps, and intervene by pruning incorrect paths and grafting new, user-defined premises. This shifts interaction from passive observation to active collaboration, steering models toward more accurate and trustworthy conclusions. Across GSM8K and StrategyQA, Vis-CoT improves final-answer accuracy by up to 24 percentage points over non-interactive baselines. A user study also shows large gains in perceived usability and trust. Vis-CoT points to a practical path for more reliable, understandable, and collaborative reasoning by combining LLMs with targeted human oversight.