Software Engineering for Large Language Models: Research Status, Challenges and the Road Ahead
This work addresses the need for structured software engineering approaches in LLM development for researchers and practitioners, but it is incremental as it synthesizes existing knowledge without introducing new methods.
The paper tackles the lack of systematic exploration of software engineering challenges in large language model development by analyzing research status across six lifecycle phases, identifying key challenges and proposing research directions to facilitate future advances.
The rapid advancement of large language models (LLMs) has redefined artificial intelligence (AI), pushing the boundaries of AI research and enabling unbounded possibilities for both academia and the industry. However, LLM development faces increasingly complex challenges throughout its lifecycle, yet no existing research systematically explores these challenges and solutions from the perspective of software engineering (SE) approaches. To fill the gap, we systematically analyze research status throughout the LLM development lifecycle, divided into six phases: requirements engineering, dataset construction, model development and enhancement, testing and evaluation, deployment and operations, and maintenance and evolution. We then conclude by identifying the key challenges for each phase and presenting potential research directions to address these challenges. In general, we provide valuable insights from an SE perspective to facilitate future advances in LLM development.