Self-Similar Epochs: Value in Arrangement
This addresses a fundamental optimization bottleneck in machine learning training, offering a novel enhancement to SGD that could benefit practitioners in domains like recommendation systems, though it is incremental in nature.
The paper tackles the problem of inefficient training in stochastic gradient descent due to random ordering of examples, proposing self-similar arrangements that preserve data structure to accelerate training. It demonstrates training speed improvements of 3%-37% on synthetic and recommendation datasets for matrix factorization tasks.
Optimization of machine learning models is commonly performed through stochastic gradient updates on randomly ordered training examples. This practice means that sub-epochs comprise of independent random samples of the training data that may not preserve informative structure present in the full data. We hypothesize that the training can be more effective with {\em self-similar} arrangements that potentially allow each epoch to provide benefits of multiple ones. We study this for "matrix factorization" -- the common task of learning metric embeddings of entities such as queries, videos, or words from example pairwise associations. We construct arrangements that preserve the weighted Jaccard similarities of rows and columns and experimentally observe training acceleration of 3\%-37\% on synthetic and recommendation datasets. Principled arrangements of training examples emerge as a novel and potentially powerful enhancement to SGD that merits further exploration.