ML LGDec 30, 2024

Soft Diamond Regularizers for Deep Learning

arXiv:2412.20724v23.1h-index: 6

Originality Incremental advance

AI Analysis

This work addresses the problem of improving regularization for deep learning models to enhance performance and sparsity, which is incremental as it builds on existing regularization techniques.

The authors introduced soft diamond regularizers based on thick-tailed symmetric alpha stable distributions to improve deep learning performance, achieving better accuracy and sparsity on image datasets (CIFAR-10, CIFAR-100, Caltech-256) and German-to-English translation (IWSLT-2016) compared to state-of-the-art methods like lasso and ridge regularizers.

This chapter presents the new family of soft diamond synaptic regularizers based on thick-tailed symmetric alpha stable $SαS$ probability bell curves. These new parametrized weight priors improved deep-learning performance on image and language-translation test sets and increased the sparsity of the trained weights. They outperformed the state-of-the-art hard-diamond Laplacian regularizer of sparse lasso regression and classification. The $SαS$ synaptic weight priors have power-law bell-curve tails that are thicker than the thin exponential tails of Gaussian bell curves that underly ridge regularizers. Their tails get thicker as the $α$ parameter decreases. These thicker tails model more impulsive behavior and allow for occasional distant search in synaptic weight spaces of extremely high dimension. The geometry of their constraint sets has a diamond shape. The shape varies from a circle to a star or diamond that depends on the $α$ tail thickness and dispersion of the $SαS$ weight prior. These $SαS$ bell curves lack a closed form in general and this makes direct training computationally intensive. We removed this computational bottleneck by using a precomputed look-up table. We tested the soft diamond regularizers with deep neural classifiers on both image test sets and German-to-English language translation. The image simulations used the three datasets CIFAR-10, CIFAR-100, and Caltech-256. The regularizers improved the accuracy and sparsity of the classifiers. We also tested with deep neural machine-translation models on the IWSLT-2016 Evaluation dataset for German-to-English text translation. They also outperformed ridge regularizers and lasso regularizers. These findings recommend the sub-Cauchy $α= 0.5$ soft diamond regularizer as a competitive and sparse regularizer for large-scale machine learning.

View on arXiv PDF

Similar