DC CLJun 10, 2024

LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching

Simranjit Singh, Michael Fore, Andreas Karatzas, Chaehong Lee, Yanan Jian, Longfei Shangguan, Fuxun Yu, Iraklis Anagnostopoulos, Dimitrios Stamoulis

arXiv:2406.06799v25.18 citations

Originality Incremental advance

AI Analysis

This addresses efficiency issues for developers and users of large-scale LLM systems, but it is incremental as it builds on existing tool-augmented and caching mechanisms.

The paper tackles the problem of high overhead from data operations in tool-augmented LLMs by introducing LLM-dCache, which allows LLMs to manage cache decisions via prompting, resulting in an average 1.24x improvement in Copilot times across various LLMs and prompting techniques.

As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to manage cache decisions via prompting, seamlessly integrating with existing function-calling mechanisms. Tested on an industry-scale massively parallel platform that spans hundreds of GPT endpoints and terabytes of imagery, our method improves Copilot times by an average of 1.24x across various LLMs and prompting techniques.

View on arXiv PDF

Similar