Method Drift›LLM reasoning / chain-of-thought
AutoGen
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent ConversationLLM reasoning / chain-of-thought · first seen Aug 16, 2023
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 1 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating AutoGen. Values are copied from the source paper's tables — verify against the cited paper.
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · Easy [Gemini-2.5-Pro]
87.50 vs 59.72
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · Hard [Gemini-2.5-Pro]
45.24 vs 10.32
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · Data Wrangling [Gemini-2.5-Pro]
30.4 vs 25.6
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · ML [Gemini-2.5-Pro]
57.3 vs 51.6
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · EDA [Gemini-2.5-Pro]
34.8 vs 25.6
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · Medium [Gemini-2.5-Pro]
35.2 vs 28.5
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · Total [Gemini-2.5-Pro]
38.5 vs 30.8
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · Total [Original setting]
44.69 vs 22.83
- DS-STAR: Data Science Agent via Iterative Planning and Verification
DS-STAR (Ours) beats AutoGen · Total [Oracle setting]
52.55 vs 31.77
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.