CVMMOct 26, 2023

ControlLLM: Augment Language Models with Tools by Searching on Graphs

arXiv:2310.17796v370 citationsh-index: 64Has Code
Originality Highly original
AI Analysis

This addresses the challenge of ambiguous prompts and inefficient tool scheduling for users relying on LLMs for real-world applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of enabling large language models to effectively use multi-modal tools for complex tasks by introducing ControlLLM, a framework that improves accuracy, efficiency, and versatility in tool invocation, as demonstrated in evaluations on image, audio, and video processing tasks.

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a \textit{task decomposer} that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a \textit{Thoughts-on-Graph (ToG) paradigm} that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an \textit{execution engine with a rich toolbox} that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods. The code is at https://github.com/OpenGVLab/ControlLLM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes