How Scale Affects Structure in Java Programs
This provides incremental insights for software metrics research, helping to distinguish between 'programming in the small' and 'programming in the large' for software developers and researchers.
The paper analyzed a large collection of 30,911 Java programs to uncover previously unknown size-related super- and sublinear effects, showing how program characteristics vary disproportionately with size, enabling better normalization of software metrics.
Many internal software metrics and external quality attributes of Java programs correlate strongly with program size. This knowledge has been used pervasively in quantitative studies of software through practices such as normalization on size metrics. This paper reports size-related super- and sublinear effects that have not been known before. Findings obtained on a very large collection of Java programs -- 30,911 projects hosted at Google Code as of Summer 2011 -- unveils how certain characteristics of programs vary disproportionately with program size, sometimes even non-monotonically. Many of the specific parameters of nonlinear relations are reported. This result gives further insights for the differences of "programming in the small" vs. "programming in the large." The reported findings carry important consequences for OO software metrics, and software research in general: metrics that have been known to correlate with size can now be properly normalized so that all the information that is left in them is size-independent.