Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?
This work highlights a critical limitation in using off-the-shelf personality detection for software engineering research, indicating the need for domain-specific tools to ensure reliable results.
The study evaluated the performance of general-purpose personality detection tools on software developers' emails from the Apache Software Foundation, finding low accuracy and tool disagreement, and showed that replicating previous research with different tools led to diverging conclusions.
Assessing the personality of software engineers may help to match individual traits with the characteristics of development activities such as code review and testing, as well as support managers in team composition. However, self-assessment questionnaires are not a practical solution for collecting multiple observations on a large scale. Instead, automatic personality detection, while overcoming these limitations, is based on off-the-shelf solutions trained on non-technical corpora, which might not be readily applicable to technical domains like Software Engineering (SE). In this paper, we first assess the performance of general-purpose personality detection tools when applied to a technical corpus of developers' emails retrieved from the public archives of the Apache Software Foundation. We observe a general low accuracy of predictions and an overall disagreement among the tools. Second, we replicate two previous research studies in SE by replacing the personality detection tool used to infer developers' personalities from pull-request discussions and emails. We observe that the original results are not confirmed, i.e., changing the tool used in the original study leads to diverging conclusions. Our results suggest a need for personality detection tools specially targeted for the software engineering domain.