CLCVJan 21

Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

arXiv:2601.14750v210 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses efficiency and interpretability issues in large language models for researchers and practitioners, though it is incremental as it builds on existing vision-language models.

The paper tackles the computational overhead and lack of transparency in Chain-of-Thought prompting by introducing Render-of-Thought, a framework that renders textual reasoning steps into images, achieving 3-4x token compression and substantial inference acceleration while maintaining competitive performance on reasoning benchmarks.

Chain-of-Thought (CoT) prompting has achieved remarkable success in unlocking the reasoning capabilities of Large Language Models (LLMs). Although CoT prompting enhances reasoning, its verbosity imposes substantial computational overhead. Recent works often focus exclusively on outcome alignment and lack supervision on the intermediate reasoning process. These deficiencies obscure the analyzability of the latent reasoning chain. To address these challenges, we introduce Render-of-Thought (RoT), the first framework to reify the reasoning chain by rendering textual steps into images, making the latent rationale explicit and traceable. Specifically, we leverage the vision encoders of existing Vision Language Models (VLMs) as semantic anchors to align the vision embeddings with the textual space. This design ensures plug-and-play implementation without incurring additional pre-training overhead. Extensive experiments on mathematical and logical reasoning benchmarks demonstrate that our method achieves 3-4x token compression and substantial inference acceleration compared to explicit CoT. Furthermore, it maintains competitive performance against other methods, validating the feasibility of this paradigm. Our code is available at https://github.com/TencentBAC/RoT

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes