An Empirical Study of OpenAI API Discussions on Stack Overflow
It addresses practical problems for developers and vendors in using LLM APIs, but is incremental as it applies existing empirical methods to a new dataset.
This study tackled the lack of empirical research on challenges developers face when using OpenAI APIs by analyzing 2,874 Stack Overflow discussions, identifying specific issues across nine categories and proposing implications for stakeholders.
The rapid advancement of large language models (LLMs), represented by OpenAI's GPT series, has significantly impacted various domains such as natural language processing, software development, education, healthcare, finance, and scientific research. However, OpenAI APIs introduce unique challenges that differ from traditional APIs, such as the complexities of prompt engineering, token-based cost management, non-deterministic outputs, and operation as black boxes. To the best of our knowledge, the challenges developers encounter when using OpenAI APIs have not been explored in previous empirical studies. To fill this gap, we conduct the first comprehensive empirical study by analyzing 2,874 OpenAI API-related discussions from the popular Q&A forum Stack Overflow. We first examine the popularity and difficulty of these posts. After manually categorizing them into nine OpenAI API-related categories, we identify specific challenges associated with each category through topic modeling analysis. Based on our empirical findings, we finally propose actionable implications for developers, LLM vendors, and researchers.