Web(er) of Hate: A Survey on How Hate Speech Is Typed
This work addresses dataset reliability issues for researchers in hate speech detection, but it is incremental as it builds on existing survey and theoretical frameworks.
The paper examines methodological choices in hate speech dataset curation, highlighting their impact on reliability, and advocates for a reflexive approach based on Max Weber's ideal types to improve transparency and rigor.
The curation of hate speech datasets involves complex design decisions that balance competing priorities. This paper critically examines these methodological choices in a diverse range of datasets, highlighting common themes and practices, and their implications for dataset reliability. Drawing on Max Weber's notion of ideal types, we argue for a reflexive approach in dataset creation, urging researchers to acknowledge their own value judgments during dataset construction, fostering transparency and methodological rigour.