Programming Language Co-Usage Patterns on Stack Overflow: Analysis of the Developer Ecosystem
For researchers and practitioners in software engineering and developer ecosystem analysis, this provides a comprehensive, data-driven characterization of language co-usage patterns, though the methods are established and the findings are largely descriptive.
This paper analyzes how developers combine programming languages by mining Stack Overflow posts across 186 languages, using FP-Growth, LDA, and Louvain community detection. It identifies tight coupling clusters (e.g., shell/bash, Swift/Objective-C), 25 developer profiles, and three macro-communities (web/enterprise, Apple, systems/scientific), with all methods converging on the same ecosystem structure.
Understanding how developers combine programming languages in practice reveals the hidden structure of the software ecosystem: which languages are used as complements, which define coherent technology stacks, and which bridge disparate communities. We present a three-phase empirical pipeline that mines Stack Overflow posts by hundreds of thousands of developers across 186 programming languages, applying FP-Growth frequent itemset mining, Latent Dirichlet Allocation topic modeling, and Louvain community detection on a weighted co-usage graph, with the goal of characterizing co-usage coupling, latent developer specializations, and macro-level ecosystem structure simultaneously from behavioral data. FP-Growth identifies tight coupling clusters such as shell/bash, Swift/Objective-C, and the C-family with lift values far exceeding what individual language popularity predicts. LDA produces 25 developer profiles including Apple-platform developers, scientific and hardware programmers, functional/academic programmers, and two distinct Unix scripting sub-profiles. Louvain partitions the language graph into three macro-communities: web/enterprise, Apple ecosystem, and systems/scientific, and identifies Java as the highest-degree hub connecting all three. All three methods independently converge on the same ecosystem structure, providing strong cross-method validation of the findings.