Power-law Scaling to Assist with Key Challenges in Artificial Intelligence
This provides a method for a priori dataset size estimation and benchmarking training complexity, addressing challenges in AI applications, though it appears incremental as it applies an existing concept to deep learning.
The paper tackles the problem of predicting test error convergence in deep learning by applying power-law scaling, showing that test errors on handwritten digit datasets decrease as a power-law with database size, with exponents increasing with hidden layers, and achieving near state-of-the-art accuracy with one training epoch.
Power-law scaling, a central concept in critical phenomena, is found to be useful in deep learning, where optimized test errors on handwritten digit examples converge as a power-law to zero with database size. For rapid decision making with one training epoch, each example is presented only once to the trained network, the power-law exponent increased with the number of hidden layers. For the largest dataset, the obtained test error was estimated to be in the proximity of state-of-the-art algorithms for large epoch numbers. Power-law scaling assists with key challenges found in current artificial intelligence applications and facilitates an a priori dataset size estimation to achieve a desired test accuracy. It establishes a benchmark for measuring training complexity and a quantitative hierarchy of machine learning tasks and algorithms.