Rui Kuang

LG
3papers
3citations
Novelty53%
AI Score21

3 Papers

LGFeb 20, 2018
Scalable Label Propagation for Multi-relational Learning on the Tensor Product of Graphs

Zhuliu Li, Raphael Petegrosso, Shaden Smith et al.

Multi-relational learning on knowledge graphs infers high-order relations among the entities across the graphs. This learning task can be solved by label propagation on the tensor product of the knowledge graphs to learn the high-order relations as a tensor. In this paper, we generalize a widely used label propagation model to the normalized tensor product graph, and propose an optimization formulation and a scalable Low-rank Tensor-based Label Propagation algorithm (LowrankTLP) to infer multi-relations for two learning tasks, hyperlink prediction and multiple graph alignment. The optimization formulation minimizes the upper bound of the noisy tensor estimation error for multiple graph alignment, by learning with a subset of the eigen-pairs in the spectrum of the normalized tensor product graph. We also provide a data-dependent transductive Rademacher bound for binary hyperlink prediction. We accelerate LowrankTLP with parallel tensor computation which enables label propagation on a tensor product of 100 graphs each of size 1000 in less than half hour in the simulation. LowrankTLP was also applied to predicting the author-paper-venue hyperlinks in publication records, alignment of segmented regions across up to 26 CT-scan images and alignment of protein-protein interaction networks across multiple species. The experiments demonstrate that LowrankTLP indeed well approximates the original label propagation with better scalability and accuracy.

LGFeb 28, 2017
Low-rank Label Propagation for Semi-supervised Learning with 100 Millions Samples

Raphael Petegrosso, Wei Zhang, Zhuliu Li et al.

The success of semi-supervised learning crucially relies on the scalability to a huge amount of unlabelled data that are needed to capture the underlying manifold structure for better classification. Since computing the pairwise similarity between the training data is prohibitively expensive in most kinds of input data, currently, there is no general ready-to-use semi-supervised learning method/tool available for learning with tens of millions or more data points. In this paper, we adopted the idea of two low-rank label propagation algorithms, GLNP (Global Linear Neighborhood Propagation) and Kernel Nyström Approximation, and implemented the parallelized version of the two algorithms accelerated with Nesterov's accelerated projected gradient descent for Big-data Label Propagation (BigLP). The parallel algorithms are tested on five real datasets ranging from 7000 to 10,000,000 in size and a simulation dataset of 100,000,000 samples. In the experiments, the implementation can scale up to datasets with 100,000,000 samples and hundreds of features and the algorithms also significantly improved the prediction accuracy when only a very small percentage of the data is labeled. The results demonstrate that the BigLP implementation is highly scalable to big data and effective in utilizing the unlabeled data for semi-supervised learning.

CEMar 20, 2014
Network-based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis

Wei Zhang, Jae-Woong Chang, Lilong Lin et al.

High-throughput mRNA sequencing (RNA-Seq) is widely used for transcript quantification of gene isoforms. Since RNA-Seq data alone is often not sufficient to accurately identify the read origins from the isoforms for quantification, we propose to explore protein domain-domain interactions as prior knowledge for integrative analysis with RNA-seq data. We introduce a Network-based method for RNA-Seq-based Transcript Quantification (Net-RSTQ) to integrate protein domain-domain interaction network with short read alignments for transcript abundance estimation. Based on our observation that the abundances of the neighboring isoforms by domain-domain interactions in the network are positively correlated, Net-RSTQ models the expression of the neighboring transcripts as Dirichlet priors on the likelihood of the observed read alignments against the transcripts in one gene. The transcript abundances of all the genes are then jointly estimated with alternating optimization of multiple EM problems. In simulation Net-RSTQ effectively improved isoform transcript quantifications when isoform co-expressions correlate with their interactions. qRT-PCR results on 25 multi-isoform genes in a stem cell line, an ovarian cancer cell line, and a breast cancer cell line also showed that Net-RSTQ estimated more consistent isoform proportions with RNA-Seq data. In the experiments on the RNA-Seq data in The Cancer Genome Atlas (TCGA), the transcript abundances estimated by Net-RSTQ are more informative for patient sample classification of ovarian cancer, breast cancer and lung cancer. All experimental results collectively support that Net-RSTQ is a promising approach for isoform quantification.