AI CLNov 2, 2024

Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage

Bin Lei, Yuchen Li, Yiming Zeng, Tao Ren, Yi Luo, Tianyu Shi, Zitian Gao, Zeyu Hu, Weitai Kang, Qiuwu Chen

arXiv:2411.01114v112 citationsh-index: 7

Originality Highly original

AI Analysis

This addresses the problem of high API costs and limited autonomous reasoning in LLMs for developers and researchers, representing a strong specific gain rather than a foundational advancement.

The paper tackles the limitations of large language models in autonomously solving real-world engineering problems and reasoning through complex logic by introducing the Infant Agent, which integrates task-aware functions, operators, hierarchical management, and memory retrieval to improve GPT-4o's accuracy from 0.33% to 30% on SWE-bench-lite and from 13.3% to 37% on AIME-2024.

Despite the impressive capabilities of large language models (LLMs), they currently exhibit two primary limitations, \textbf{\uppercase\expandafter{\romannumeral 1}}: They struggle to \textbf{autonomously solve the real world engineering problem}. \textbf{\uppercase\expandafter{\romannumeral 2}}: They remain \textbf{challenged in reasoning through complex logic problems}. To address these challenges, we developed the \textsc{Infant Agent}, integrating task-aware functions, operators, a hierarchical management system, and a memory retrieval mechanism. Together, these components enable large language models to sustain extended reasoning processes and handle complex, multi-step tasks efficiently, all while significantly reducing API costs. Using the \textsc{Infant Agent}, GPT-4o's accuracy on the SWE-bench-lite dataset rises from $\mathbf{0.33\%}$ to $\mathbf{30\%}$, and in the AIME-2024 mathematics competition, it increases GPT-4o's accuracy from $\mathbf{13.3\%}$ to $\mathbf{37\%}$.

View on arXiv PDF

Similar