Is LLaVA-CoT superseded?

LLaVA-CoT (LLM reasoning / chain-of-thought): superseded — cited as a baseline and beaten by newer methods. 1 paper(s) critique it, 1 beat it on benchmarks — #55 of 772 most-superseded. Sub-problem: cluster led by VL-Rethinker-7B. Newer alternatives in the same sub-problem include TVI-CoT.

Method Drift›LLM reasoning / chain-of-thought

Superseded baseline#55 of 772 most-superseded

LLaVA-CoT

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

LLM reasoning / chain-of-thought · first seen Nov 15, 2024

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites LLaVA-CoT as a baseline.

“However, these approaches, even advanced models like GPT-5 or Gemini, perform CoT in pure text space. Once visual features are initially encoded, they cannot be re-accessed during reasoning.”
— TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding

Beaten on benchmarks

Head-to-head results where a newer method reports beating LLaVA-CoT. Values are copied from the source paper's tables — verify against the cited paper.

TVI-CoT beats LLaVA-CoT · MathVerse accuracy [MathVerse]
60.2 vs 43.2
TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · MathVista accuracy [MathVista]
76.6 vs 54.8
TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · Plane Geometry accuracy [MathVerse]
66.7 vs 46.2
TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · Solid Geometry accuracy [MathVerse]
58.3 vs 39.1
TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · Figure question answering accuracy [MathVista]
73.2 vs 48.6
TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
TVI-CoT beats LLaVA-CoT · Geometry problem solving accuracy [MathVista]
80.5 vs 55.4
TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.

TVI-CoT TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
Jun 7, 2026