Method Drift›LLM reasoning / chain-of-thought

Superseded baseline#401 of 772 most-superseded

LLaVA-o1

LLM reasoning / chain-of-thought

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 0 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites LLaVA-o1 as a baseline.

“However, these approaches, even advanced models like GPT-5 or Gemini, perform CoT in pure text space. Once visual features are initially encoded, they cannot be re-accessed during reasoning.”
— TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.

TVI-CoT TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
Jun 7, 2026