Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social Media
This addresses the problem of inaccurate hate speech monitoring for LGBTQ+ communities on social media, highlighting limitations in current methods.
The paper investigated how sociocultural factors affect hate speech detection systems for anti-LGBTQ+ content on social media, finding that open-source training data's alignment with social and cultural contexts influences predictions and keyword-based approaches cause models to overfit on slurs, potentially missing content.
The purpose of this paper is to ascertain the influence of sociocultural factors (i.e., social, cultural, and political) in the development of hate speech detection systems. We set out to investigate the suitability of using open-source training data to monitor levels of anti-LGBTQ+ content on social media across different national-varieties of English. Our findings suggests the social and cultural alignment of open-source hate speech data sets influences the predicted outputs. Furthermore, the keyword-search approach of anti-LGBTQ+ slurs in the development of open-source training data encourages detection models to overfit on slurs; therefore, anti-LGBTQ+ content may go undetected. We recommend combining empirical outputs with qualitative insights to ensure these systems are fit for purpose.