CY LGMar 15, 2022

Sex Trouble: Common pitfalls in incorporating sex/gender in medical machine learning and how to avoid them

arXiv:2203.08227v12.316 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This addresses methodological issues for medical machine learning researchers, particularly in using electronic health record data, but is incremental as it builds on existing critiques without introducing new technical methods.

The paper tackles the problem of false assumptions about sex and gender in medical machine learning, identifying three common pitfalls and offering recommendations to avoid perpetuating them and better serve all patients.

False assumptions about sex and gender are deeply embedded in the medical system, including that they are binary, static, and concordant. Machine learning researchers must understand the nature of these assumptions in order to avoid perpetuating them. In this perspectives piece, we identify three common mistakes that researchers make when dealing with sex/gender data: "sex confusion", the failure to identity what sex in a dataset does or doesn't mean; "sex obsession", the belief that sex, specifically sex assigned at birth, is the relevant variable for most applications; and "sex/gender slippage", the conflation of sex and gender even in contexts where only one or the other is known. We then discuss how these pitfalls show up in machine learning studies based on electronic health record data, which is commonly used for everything from retrospective analysis of patient outcomes to the development of algorithms to predict risk and administer care. Finally, we offer a series of recommendations about how machine learning researchers can produce both research and algorithms that more carefully engage with questions of sex/gender, better serving all patients, including transgender people.

View on arXiv PDF

Similar