Gustavo Silva

LGNov 20, 2023

Fine-Tuning Adaptive Stochastic Optimizers: Determining the Optimal Hyperparameter $ε$ via Gradient Magnitude Histogram Analysis

Gustavo Silva, Paul Rodriguez

Stochastic optimizers play a crucial role in the successful training of deep neural network models. To achieve optimal model performance, designers must carefully select both model and optimizer hyperparameters. However, this process is frequently demanding in terms of computational resources and processing time. While it is a well-established practice to tune the entire set of optimizer hyperparameters for peak performance, there is still a lack of clarity regarding the individual influence of hyperparameters mislabeled as "low priority", including the safeguard factor $ε$ and decay rate $β$, in leading adaptive stochastic optimizers like the Adam optimizer. In this manuscript, we introduce a new framework based on the empirical probability density function of the loss' gradient magnitude, termed as the "gradient magnitude histogram", for a thorough analysis of adaptive stochastic optimizers and the safeguard hyperparameter $ε$. This framework reveals and justifies valuable relationships and dependencies among hyperparameters in connection to optimal performance across diverse tasks, such as classification, language modeling and machine translation. Furthermore, we propose a novel algorithm using gradient magnitude histograms to automatically estimate a refined and accurate search space for the optimal safeguard hyperparameter $ε$, surpassing the conventional trial-and-error methodology by establishing a worst-case search space that is two times narrower.

LGNov 19, 2020

Efficient Consensus Model based on Proximal Gradient Method applied to Convolutional Sparse Problems

Gustavo Silva, Paul Rodriguez

Convolutional sparse representation (CSR), shift-invariant model for inverse problems, has gained much attention in the fields of signal/image processing, machine learning and computer vision. The most challenging problems in CSR implies the minimization of a composite function of the form $min_x \sum_i f_i(x) + g(x)$, where a direct and low-cost solution can be difficult to achieve. However, it has been reported that semi-distributed formulations such as ADMM consensus can provide important computational benefits. In the present work, we derive and detail a thorough theoretical analysis of an efficient consensus algorithm based on proximal gradient (PG) approach. The effectiveness of the proposed algorithm with respect to its ADMM counterpart is primarily assessed in the classic convolutional dictionary learning problem. Furthermore, our consensus method, which is generically structured, can be used to solve other optimization problems, where a sum of convex functions with a regularization term share a single global variable. As an example, the proposed algorithm is also applied to another particular convolutional problem for the anomaly detection task.

Gustavo Silva

2 Papers