Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI
This work addresses performance optimization in LLM serving for organizations deploying AI systems, though it is incremental as it builds on existing tuning methods with a specific focus on black-box approaches.
The paper tackles the problem of optimizing LLM serving performance by introducing a black-box online controller that uses end-to-end measurements and hill climbing to maximize goodput, achieving empirical improvements in throughput for requests meeting service-level objectives. It also advocates for including system performance and sustainability metrics in AI Factsheets to enhance trust in AI adoption.
In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentation, and hill climbing to maximize goodput, defined as the throughput of requests that satisfy the service-level objective. We provide empirical evidence that this design is well-founded. Using this advance in LLM serving as a concrete example, we then discuss the importance of integrating system performance and sustainability metrics into Factsheets for organizations adopting AI systems.