Frequency Distribution of Error Messages
This provides a quantitative method to contrast languages or compilers, which could aid in writing error explanations for novices, though it is incremental in nature.
The study investigated the frequency distribution of programming error messages in Python and Java, finding that they empirically resemble Zipf-Mandelbrot distributions, with parameters fitted using maximum-likelihood estimation.
Which programming error messages are the most common? We investigate this question, motivated by writing error explanations for novices. We consider large data sets in Python and Java that include both syntax and run-time errors. In both data sets, after grouping essentially identical messages, the error message frequencies empirically resemble Zipf-Mandelbrot distributions. We use a maximum-likelihood approach to fit the distribution parameters. This gives one possible way to contrast languages or compilers quantitatively.