Method Drift›Tool use / function calling
Tracked
RC-GRPO
RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling AgentsTool use / function calling · first seen Feb 3, 2026
current frontier — recent, not yet superseded in the knowledge base
0 papers critique it · 0 beat it on benchmarks
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.