LG AI CVOct 5, 2023

Accelerated Neural Network Training with Rooted Logistic Objectives

Zhu Wang, Praveen Raj Veluswami, Harsh Mishra, Sathya N. Ravi

arXiv:2310.03890v12.0h-index: 6Has Code

Originality Incremental advance

AI Analysis

This work addresses training efficiency for neural networks, but it appears incremental as it modifies an existing loss function rather than introducing a new paradigm.

The paper tackles the problem of slow neural network training by proposing a rooted logistic objective that is strictly convex, leading to faster convergence and performance improvements on classification benchmarks and generative modeling applications.

Many neural networks deployed in the real world scenarios are trained using cross entropy based loss functions. From the optimization perspective, it is known that the behavior of first order methods such as gradient descent crucially depend on the separability of datasets. In fact, even in the most simplest case of binary classification, the rate of convergence depends on two factors: (1) condition number of data matrix, and (2) separability of the dataset. With no further pre-processing techniques such as over-parametrization, data augmentation etc., separability is an intrinsic quantity of the data distribution under consideration. We focus on the landscape design of the logistic function and derive a novel sequence of {\em strictly} convex functions that are at least as strict as logistic loss. The minimizers of these functions coincide with those of the minimum norm solution wherever possible. The strict convexity of the derived function can be extended to finetune state-of-the-art models and applications. In empirical experimental analysis, we apply our proposed rooted logistic objective to multiple deep models, e.g., fully-connected neural networks and transformers, on various of classification benchmarks. Our results illustrate that training with rooted loss function is converged faster and gains performance improvements. Furthermore, we illustrate applications of our novel rooted loss function in generative modeling based downstream applications, such as finetuning StyleGAN model with the rooted loss. The code implementing our losses and models can be found here for open source software development purposes: https://anonymous.4open.science/r/rooted_loss.

View on arXiv PDF

Similar