Measuring source code conciseness across programming languages using compression
This addresses the need for consistent metrics in multi-language software development, though it is incremental in applying compression techniques to an existing challenge.
The paper tackles the problem of objectively measuring source code conciseness across programming languages by developing a model based on information theory, and it demonstrates strong correlation with alternative methods and developer surveys through quantitative results from a large benchmark of commercial software.
It is well-known, and often a topic of heated debates, that programs in some programming languages are more concise than in others. This is a relevant factor when comparing or aggregating volume-impacted metrics on source code written in a combination of programming languages. In this paper, we present a model for measuring the conciseness of programming languages in a consistent, objective and evidence-based way. We present the approach, explain how it is founded on information theoretical principles, present detailed analysis steps and show the quantitative results of applying this model to a large benchmark of diverse commercial software applications. We demonstrate that our metric for language conciseness is strongly correlated with both an alternative analytical approach, and with a large scale developer survey, and show how its results can be applied to improve software metrics for multi-language applications.