QMGTLGJul 30, 2023

Redundancy-aware unsupervised rankings for collections of gene sets

arXiv:2307.16182v1h-index: 23
Originality Incremental advance
AI Analysis

This addresses the challenge of making gene set collections more interpretable for bioinformatics researchers, though it is incremental as it builds on existing Shapley values with a redundancy-aware adaptation.

The paper tackles the problem of interpreting high-dimensional, overlapping, and redundant collections of gene sets in bioinformatics by proposing a Shapley values-based method to rank pathways, which reduces redundancy while maintaining high gene coverage and improves interpretability in Gene Sets Enrichment Analysis.

The biological roles of gene sets are used to group them into collections. These collections are often characterized by being high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation and study of their content. Bioinformatics looked for solutions to reduce their dimension or increase their intepretability. One possibility lies in aggregating overlapping gene sets to create larger pathways, but the modified biological pathways are hardly biologically justifiable. We propose to use importance scores to rank the pathways in the collections studying the context from a set covering perspective. The proposed Shapley values-based scores consider the distribution of the singletons and the size of the sets in the families; Furthermore, a trick allows us to circumvent the usual exponential complexity of Shapley values' computation. Finally, we address the challenge of including a redundancy awareness in the obtained rankings where, in our case, sets are redundant if they show prominent intersections. The rankings can be used to reduce the dimension of collections of gene sets, such that they show lower redundancy and still a high coverage of the genes. We further investigate the impact of our selection on Gene Sets Enrichment Analysis. The proposed method shows a practical utility in bioinformatics to increase the interpretability of the collections of gene sets and a step forward to include redundancy into Shapley values computations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes