Method Drift›LLM reasoning / chain-of-thought
ARGUS
Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtLLM reasoning / chain-of-thought · first seen May 29, 2025
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 1 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating ARGUS. Values are copied from the source paper's tables — verify against the cited paper.
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / MMLU / Qwen-Plus]
86.00 vs 81.50
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / MMLU / GPT-5-Nano]
77.75 vs 62.25
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / MMLU / GPT-3.5-Turbo]
76.75 vs 55.50
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / CSQA / Qwen-Plus]
81.00 vs 71.25
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / CSQA / GPT-5-Nano]
64.00 vs 29.75
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / CSQA / GPT-3.5-Turbo]
73.00 vs 40.75
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / LogiQA / Qwen-Plus]
75.00 vs 62.25
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / LogiQA / GPT-5-Nano]
45.25 vs 25.25
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Task Success Rate (TSR) [Independent Attacks / LogiQA / GPT-3.5-Turbo]
51.00 vs 15.00
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Attack Success Rate (ASR) [Independent Attacks / MMLU / Qwen-Plus]
8.00 vs 15.00
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Attack Success Rate (ASR) [Independent Attacks / MMLU / GPT-5-Nano]
15.00 vs 34.50
- Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification
STAR beats ARGUS · Attack Success Rate (ASR) [Independent Attacks / MMLU / GPT-3.5-Turbo]
11.25 vs 41.50