Method Drift›LLM reasoning / chain-of-thought
LLaVA-CoT
LLaVA-CoT: Let Vision Language Models Reason Step-by-StepLLM reasoning / chain-of-thought · first seen Nov 15, 2024
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites LLaVA-CoT as a baseline.
“However, these approaches, even advanced models like GPT-5 or Gemini, perform CoT in pure text space. Once visual features are initially encoded, they cannot be re-accessed during reasoning.”
— TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
Beaten on benchmarks
Head-to-head results where a newer method reports beating LLaVA-CoT. Values are copied from the source paper's tables — verify against the cited paper.
- TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · MathVerse accuracy [MathVerse]
60.2 vs 43.2
- TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · MathVista accuracy [MathVista]
76.6 vs 54.8
- TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · Plane Geometry accuracy [MathVerse]
66.7 vs 46.2
- TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · Solid Geometry accuracy [MathVerse]
58.3 vs 39.1
- TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · Figure question answering accuracy [MathVista]
73.2 vs 48.6
- TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · Geometry problem solving accuracy [MathVista]
80.5 vs 55.4
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 7, 2026