LGMLOct 26, 2018

Efficient learning of neighbor representations for boundary trees and forests

arXiv:1810.11165v1
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck for neighbor-based classification methods, offering an incremental improvement for machine learning practitioners handling large datasets.

The paper tackles the computational scalability and accuracy limitations of the Differentiable Boundary Tree algorithm by introducing Differentiable Boundary Sets, which reduces training time and improves classification accuracy and data representability on datasets like MNIST and Fashion-MNIST.

We introduce a semiparametric approach to neighbor-based classification. We build off the recently proposed Boundary Trees algorithm by Mathy et al.(2015) which enables fast neighbor-based classification, regression and retrieval in large datasets. While boundary trees use an Euclidean measure of similarity, the Differentiable Boundary Tree algorithm by Zoran et al.(2017) was introduced to learn low-dimensional representations of complex input data, on which semantic similarity can be calculated to train boundary trees. As is pointed out by its authors, the differentiable boundary tree approach contains a few limitations that prevents it from scaling to large datasets. In this paper, we introduce Differentiable Boundary Sets, an algorithm that overcomes the computational issues of the differentiable boundary tree scheme and also improves its classification accuracy and data representability. Our algorithm is efficiently implementable with existing tools and offers a significant reduction in training time. We test and compare the algorithms on the well known MNIST handwritten digits dataset and the newer Fashion-MNIST dataset by Xiao et al.(2017).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes