SE MEJul 23, 2021

Applying Inter-rater Reliability and Agreement in Grounded Theory Studies in Software Engineering

Jessica Díaz, Jorge Pérez, Carolina Gallardo, Ángel González-Prieto

arXiv:2107.11449v16.46 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of inconsistent qualitative research methods for software engineering researchers, though it is incremental as it formalizes existing practices rather than introducing a new paradigm.

The paper tackles the lack of guidelines for applying Inter-Rater Reliability and Agreement in Grounded Theory studies in software engineering, presenting a systematic process that improves research rigor by ensuring consensus among multiple raters during iterative coding.

In recent years, the qualitative research on empirical software engineering that applies Grounded Theory is increasing. Grounded Theory (GT) is a technique for developing theory inductively e iteratively from qualitative data based on theoretical sampling, coding, constant comparison, memoing, and saturation, as main characteristics. Large or controversial GT studies may involve multiple researchers in collaborative coding, which requires a kind of rigor and consensus that an individual coder does not. Although many qualitative researchers reject quantitative measures in favor of other qualitative criteria, many others are committed to measuring consensus through Inter-Rater Reliability (IRR) and/or Inter-Rater Agreement (IRA) techniques to develop a shared understanding of the phenomenon being studied. However, there are no specific guidelines about how and when to apply IRR/IRA during the iterative process of GT, so researchers have been using ad hoc methods for years. This paper presents a process for systematically applying IRR/IRA in GT studies that meets the iterative nature of this qualitative research method, which is supported by a previous systematic literature review on applying IRR/RA in GT studies in software engineering. This process allows researchers to incrementally generate a theory while ensuring consensus on the constructs that support it and, thus, improving the rigor of qualitative research. This formalization helps researchers to apply IRR/IRA to GT studies when various raters are involved in coding. Measuring consensus among raters promotes communicability, transparency, reflexivity, replicability, and trustworthiness of the research.

View on arXiv PDF

Similar