Method Drift›LLM reasoning / chain-of-thought
Data Interpreter
Data Interpreter: An LLM Agent For Data ScienceLLM reasoning / chain-of-thought · first seen Feb 28, 2024
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Data Interpreter as a baseline.
“a critical limitation of this approach is its reliance on successful code execution as the sole proxy for correctness. This often leads to sub-optimal plans, as execution success does not guarantee logical accuracy or alignment with user intent.”
— DS-STAR: Data Science Agent via Iterative Planning and Verification
Beaten on benchmarks
Head-to-head results where a newer method reports beating Data Interpreter. Values are copied from the source paper's tables — verify against the cited paper.
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats Data Interpreter · Easy [Gemini-2.5-Pro]
87.50 vs 72.22
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats Data Interpreter · Hard [Gemini-2.5-Pro]
45.24 vs 3.44
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats Data Interpreter · Total [Original setting]
44.69 vs 31.32
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats Data Interpreter · Total [Oracle setting]
52.55 vs 33.57
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.