LGJul 15, 2020
Experimental Design for Bathymetry EditingJulaiti Alafate, Yoav Freund, David T. Sandwell et al.
We describe an application of machine learning to a real-world computer assisted labeling task. Our experimental results expose significant deviations from the IID assumption commonly used in machine learning. These results suggest that the common random split of all data into training and testing can often lead to poor performance.
LGJan 25, 2019
Faster Boosting with Smaller MemoryJulaiti Alafate, Yoav Freund
State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.
LGMay 19, 2018
Tell Me Something New: A New Framework for Asynchronous Parallel LearningJulaiti Alafate, Yoav Freund
We present a novel approach for parallel computation in the context of machine learning that we call "Tell Me Something New" (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe "something new". TMSN does not require synchronization or a head node and is highly resilient against failing machines or laggards. We demonstrate the utility of TMSN by applying it to learning boosted trees. We show that our implementation is 10 times faster than XGBoost and LightGBM on the splice-site prediction problem.