MLSep 15, 2015

Macau: Scalable Bayesian Multi-relational Factorization with Side Information using MCMC

arXiv:1509.04610v227 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of predicting sparsely observed relations in large-scale multi-relational data, which is important for domains like bioinformatics, though it appears incremental as it builds on existing factorization methods with scalability enhancements.

The authors tackled the problem of scalable Bayesian factorization for heterogeneous relational data by proposing Macau, a method that incorporates side information and scales to millions of entities and observations, achieving improved performance in tasks like drug-protein activity prediction.

We propose Macau, a powerful and flexible Bayesian factorization method for heterogeneous data. Our model can factorize any set of entities and relations that can be represented by a relational model, including tensors and also multiple relations for each entity. Macau can also incorporate side information, specifically entity and relation features, which are crucial for predicting sparsely observed relations. Macau scales to millions of entity instances, hundred millions of observations, and sparse entity features with millions of dimensions. To achieve the scale up, we specially designed sampling procedure for entity and relation features that relies primarily on noise injection in linear regressions. We show performance and advanced features of Macau in a set of experiments, including challenging drug-protein activity prediction task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes