LG MLOct 26, 2018

Efficient learning of neighbor representations for boundary trees and forests

arXiv:1810.11165v10.8

Originality Incremental advance

AI Analysis

This work addresses a computational bottleneck for neighbor-based classification methods, offering an incremental improvement for machine learning practitioners handling large datasets.

The paper tackles the computational scalability and accuracy limitations of the Differentiable Boundary Tree algorithm by introducing Differentiable Boundary Sets, which reduces training time and improves classification accuracy and data representability on datasets like MNIST and Fashion-MNIST.

We introduce a semiparametric approach to neighbor-based classification. We build off the recently proposed Boundary Trees algorithm by Mathy et al.(2015) which enables fast neighbor-based classification, regression and retrieval in large datasets. While boundary trees use an Euclidean measure of similarity, the Differentiable Boundary Tree algorithm by Zoran et al.(2017) was introduced to learn low-dimensional representations of complex input data, on which semantic similarity can be calculated to train boundary trees. As is pointed out by its authors, the differentiable boundary tree approach contains a few limitations that prevents it from scaling to large datasets. In this paper, we introduce Differentiable Boundary Sets, an algorithm that overcomes the computational issues of the differentiable boundary tree scheme and also improves its classification accuracy and data representability. Our algorithm is efficiently implementable with existing tools and offers a significant reduction in training time. We test and compare the algorithms on the well known MNIST handwritten digits dataset and the newer Fashion-MNIST dataset by Xiao et al.(2017).

View on arXiv PDF

Similar