Optimisation of Overparametrized Sum-Product Networks
This work addresses optimization efficiency for machine learning practitioners using sum-product networks, but it appears incremental as it builds on known advantages of deep structures.
The paper investigates how overparameterization in deep sum-product networks accelerates parameter optimization compared to shallow models, showing through analysis and experiments that deep networks exhibit implicit acceleration akin to gradient ascent with adaptive learning rates and momentum.
It seems to be a pearl of conventional wisdom that parameter learning in deep sum-product networks is surprisingly fast compared to shallow mixture models. This paper examines the effects of overparameterization in sum-product networks on the speed of parameter optimisation. Using theoretical analysis and empirical experiments, we show that deep sum-product networks exhibit an implicit acceleration compared to their shallow counterpart. In fact, gradient-based optimisation in deep tree-structured sum-product networks is equal to gradient ascend with adaptive and time-varying learning rates and additional momentum terms.