LG CRNov 14, 2021

Invariant Risk Minimisation for Cross-Organism Inference: Substituting Mouse Data for Human Data in Human Risk Factor Discovery

Odhran O'Donoghue, Paul Duckworth, Giuseppe Ughi, Linus Scheibenreif, Kia Khezeli, Adrienne Hoarfrost, Samuel Budd, Patrick Foley, Nicholas Chia, John Kalantari, Graham Mackintosh, Frank Soboczenski

arXiv:2111.07348v21.6Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of data scarcity in human medical research for scientists and clinicians, but it is incremental as it builds on existing IRM methods and notes that further work is needed for conclusive insights.

The authors tackled the problem of limited human medical data by augmenting small human datasets with in-vitro and animal model data using Invariant Risk Minimisation (IRM) to identify invariant features across different data-generating environments, resulting in the identification of genes relevant to human cancer development with some consistency observed across varying data amounts.

Human medical data can be challenging to obtain due to data privacy concerns, difficulties conducting certain types of experiments, or prohibitive associated costs. In many settings, data from animal models or in-vitro cell lines are available to help augment our understanding of human data. However, this data is known for having low etiological validity in comparison to human data. In this work, we augment small human medical datasets with in-vitro data and animal models. We use Invariant Risk Minimisation (IRM) to elucidate invariant features by considering cross-organism data as belonging to different data-generating environments. Our models identify genes of relevance to human cancer development. We observe a degree of consistency between varying the amounts of human and mouse data used, however, further work is required to obtain conclusive insights. As a secondary contribution, we enhance existing open source datasets and provide two uniformly processed, cross-organism, homologue gene-matched datasets to the community.

View on arXiv PDF

Similar