LGMar 24, 2025Code
MODIS: Multi-Omics Data Integration for Small and unpaired datasetsDaniel Lepe-Soltero, Thierry Artières, Anaïs Baudot et al.
An important objective in computational biology is the efficient integration of multi-omics data. The task of integration comes with challenges: multi-omics data are most often unpaired (requiring diagonal integration), partially labeled with information about biological conditions, and in some situations such as rare diseases, only very small datasets are available. We present MODIS, a semi supervised framework designed to account for these particular challenges. To address the challenge of very small datasets, we propose to exploit information contained in larger multi-omics databases by training our model on a large reference database and a small target dataset simultaneously, effectively turning the problem of transfer learning into a problem of learning with class imbalance. MODIS performs diagonal integration on unpaired samples, leveraging class-labels to align modalities despite class imbalance and data scarcity. The architecture combines multiple variational auto-encoders, a class classifier and an adversarially trained modality classifier. To ensure training stability, we adapted a regularized relativistic GAN loss to this setting. We first validate MODIS on a synthetic dataset to assess the level of supervision needed for accurate alignment and to quantify the impact of class imbalance on predictive performance. We then apply our approach to the large public TCGA database, considering between 10 and 34 classes (cancer types and normal tissue). MODIS demonstrates high prediction accuracy, robust performance with limited supervision, and stability to class imbalance. These results position MODIS as a promising solution for challenging integration scenarios, particularly diagonal integration with a small number of samples, typical of rare diseases studies. The code is available at https://github.com/VILLOUTREIXLab/MODIS.
LGAug 23, 2020Code
MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approachLéo Pio-Lopez, Alberto Valdeolivas, Laurent Tichit et al.
Network embedding approaches are gaining momentum to analyse a large variety of networks. Indeed, these approaches have demonstrated their efficiency for tasks such as community detection, node classification, and link prediction. However, very few network embedding methods have been specifically designed to handle multiplex networks, i.e. networks composed of different layers sharing the same set of nodes but having different types of edges. Moreover, to our knowledge, existing approaches cannot embed multiple nodes from multiplex-heterogeneous networks, i.e. networks composed of several layers containing both different types of nodes and edges. In this study, we propose MultiVERSE, an extension of the VERSE method with Random Walks with Restart on Multiplex (RWR-M) and Multiplex-Heterogeneous (RWR-MH) networks. MultiVERSE is a fast and scalable method to learn node embeddings from multiplex and multiplex-heterogeneous networks. We evaluate MultiVERSE on several biological and social networks and demonstrate its efficiency. MultiVERSE indeed outperforms most of the other methods in the tasks of link prediction and network reconstruction for multiplex network embedding, and is also efficient in the task of link prediction for multiplex-heterogeneous network embedding. Finally, we apply MultiVERSE to study rare disease-gene associations using link prediction and clustering. MultiVERSE is freely available on github at https://github.com/Lpiol/MultiVERSE.
SIMay 5, 2023
Zoo Guide to Network EmbeddingAnthony Baptista, Rubén J. Sánchez-García, Anaïs Baudot et al.
Networks have provided extremely successful models of data and complex systems. Yet, as combinatorial objects, networks do not have in general intrinsic coordinates and do not typically lie in an ambient space. The process of assigning an embedding space to a network has attracted lots of interest in the past few decades, and has been efficiently applied to fundamental problems in network inference, such as link prediction, node classification, and community detection. In this review, we provide a user-friendly guide to the network embedding literature and current trends in this field which will allow the reader to navigate through the complex landscape of methods and approaches emerging from the vibrant research activity on these subjects.
LGJul 9, 2021
Universal Multilayer Network Exploration by Random Walk with RestartAnthony Baptista, Aitor Gonzalez, Anaïs Baudot
The amount and variety of data is increasing drastically for several years. These data are often represented as networks, which are then explored with approaches arising from network theory. Recent years have witnessed the extension of network exploration methods to leverage more complex and richer network frameworks. Random walks, for instance, have been extended to explore multilayer networks. However, current random walk approaches are limited in the combination and heterogeneity of network layers they can handle. New analytical and numerical random walk methods are needed to cope with the increasing diversity and complexity of multilayer networks. We propose here MultiXrank, a Python package that enables Random Walk with Restart (RWR) on any kind of multilayer network with an optimized implementation. This package is supported by a universal mathematical formulation of the RWR. We evaluated MultiXrank with leave-one-out cross-validation and link prediction, and introduced protocols to measure the impact of the addition or removal of multilayer network data on prediction performances. We further measured the sensitivity of MultiXrank to input parameters by in-depth exploration of the parameter space. Finally, we illustrate the versatility of MultiXrank with different use-cases of unsupervised node prioritization and supervised classification in the context of human genetic diseases.