ML LGJun 13, 2019

Random Tessellation Forests

Shufei Ge, Shijia Wang, Yee Whye Teh, Liangliang Wang, Lloyd T. Elliott

arXiv:1906.05440v518 citations

Originality Incremental advance

AI Analysis

This work addresses the need for more flexible partitioning methods in machine learning for multi-dimensional data, offering a novel framework that can capture complex inter-dimensional dependencies, though it builds incrementally on prior generalizations.

The authors tackled the problem of limited flexibility in space partitioning methods due to axis-aligned cuts by proposing the Random Tessellation Process (RTP), a framework that generalizes existing processes to allow non-axis aligned cuts in multi-dimensional data, resulting in improved accuracies in simulations and gene expression analysis.

Space partitioning methods such as random forests and the Mondrian process are powerful machine learning methods for multi-dimensional and relational data, and are based on recursively cutting a domain. The flexibility of these methods is often limited by the requirement that the cuts be axis aligned. The Ostomachion process and the self-consistent binary space partitioning-tree process were recently introduced as generalizations of the Mondrian process for space partitioning with non-axis aligned cuts in the two dimensional plane. Motivated by the need for a multi-dimensional partitioning tree with non-axis aligned cuts, we propose the Random Tessellation Process (RTP), a framework that includes the Mondrian process and the binary space partitioning-tree process as special cases. We derive a sequential Monte Carlo algorithm for inference, and provide random forest methods. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. We present a simulation study, and analyse gene expression data of brain tissue, showing improved accuracies over other methods.

View on arXiv PDF

Similar