LLM Inference Serving: Survey of Recent Advances and Opportunities
It addresses the problem of deploying and scaling LLMs efficiently for practitioners, but it is incremental as it surveys existing research rather than introducing new methods.
This survey provides a comprehensive overview of recent advancements in LLM serving systems since 2023, focusing on system-level enhancements to improve performance and efficiency without changing core decoding mechanisms, serving as a resource for practitioners to stay updated in this fast-evolving field.
This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023. We specifically examine system-level enhancements that improve performance and efficiency without altering the core LLM decoding mechanisms. By selecting and reviewing high-quality papers from prestigious ML and system venues, we highlight key innovations and practical considerations for deploying and scaling LLMs in real-world production environments. This survey serves as a valuable resource for LLM practitioners seeking to stay abreast of the latest developments in this rapidly evolving field.