Method Drift›Tool use / function calling
Search-R1 PPO
Tool use / function calling
present, but with little supersession signal in the knowledge base
0 papers critique it · 1 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating Search-R1 PPO. Values are copied from the source paper's tables — verify against the cited paper.
- Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use
CARL beats Search-R1 PPO · EM (Exact Match) [7B scale search datasets]
47.9 vs 42.8
- Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use
CARL beats Search-R1 PPO · Avg_3 [7B scale average]
42.2 vs 35.5
- Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use
CARL beats Search-R1 PPO · Tok (tokens per episode) [7B scale token efficiency]
1614 vs 2098
- Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use
CARL beats Search-R1 PPO · EM (Exact Match) [3B scale search datasets]
40.4 vs 30.2
- Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use
CARL beats Search-R1 PPO · Avg_3 [3B scale average]
35.4 vs 24.8
- Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use
CARL beats Search-R1 PPO · Tok (tokens per episode) [3B scale token efficiency]
1755 vs 2142
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.