LGMar 14, 2018

Self-Similar Epochs: Value in Arrangement

arXiv:1803.05389v31 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental optimization bottleneck in machine learning training, offering a novel enhancement to SGD that could benefit practitioners in domains like recommendation systems, though it is incremental in nature.

The paper tackles the problem of inefficient training in stochastic gradient descent due to random ordering of examples, proposing self-similar arrangements that preserve data structure to accelerate training. It demonstrates training speed improvements of 3%-37% on synthetic and recommendation datasets for matrix factorization tasks.

Optimization of machine learning models is commonly performed through stochastic gradient updates on randomly ordered training examples. This practice means that sub-epochs comprise of independent random samples of the training data that may not preserve informative structure present in the full data. We hypothesize that the training can be more effective with {\em self-similar} arrangements that potentially allow each epoch to provide benefits of multiple ones. We study this for "matrix factorization" -- the common task of learning metric embeddings of entities such as queries, videos, or words from example pairwise associations. We construct arrangements that preserve the weighted Jaccard similarities of rows and columns and experimentally observe training acceleration of 3\%-37\% on synthetic and recommendation datasets. Principled arrangements of training examples emerge as a novel and potentially powerful enhancement to SGD that merits further exploration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes