Siddharth Krishna Kumar

LG
3papers
271citations
Novelty72%
AI Score33

3 Papers

LGJan 16, 2024
GD doesn't make the cut: Three ways that non-differentiability affects neural network training

Siddharth Krishna Kumar

This paper critically examines the fundamental distinctions between gradient methods applied to non-differentiable functions (NGDMs) and classical gradient descents (GDs) for differentiable functions, revealing significant gaps in current deep learning optimization theory. We demonstrate that NGDMs exhibit markedly different convergence properties compared to GDs, strongly challenging the applicability of extensive neural network convergence literature based on $L-smoothness$ to non-smooth neural networks. Our analysis reveals paradoxical behavior of NDGM solutions for $L_{1}$-regularized problems, where increasing regularization counterintuitively leads to larger $L_{1}$ norms of optimal solutions. This finding calls into question widely adopted $L_{1}$ penalization techniques for network pruning. We further challenge the common assumption that optimization algorithms like RMSProp behave similarly in differentiable and non-differentiable contexts. Expanding on the Edge of Stability phenomenon, we demonstrate its occurrence in a broader class of functions, including Lipschitz continuous convex differentiable functions. This finding raises important questions about its relevance and interpretation in non-convex, non-differentiable neural networks, particularly those using ReLU activations. Our work identifies critical misunderstandings of NDGMs in influential literature, stemming from an overreliance on strong smoothness assumptions. These findings necessitate a reevaluation of optimization dynamics in deep learning, emphasizing the crucial need for more nuanced theoretical foundations in analyzing these complex systems.

CVJul 26, 2018
A general metric for identifying adversarial images

Siddharth Krishna Kumar

It is well known that a determined adversary can fool a neural network by making imperceptible adversarial perturbations to an image. Recent studies have shown that these perturbations can be detected even without information about the neural network if the strategy taken by the adversary is known beforehand. Unfortunately, these studies suffer from the generalization limitation -- the detection method has to be recalibrated every time the adversary changes his strategy. In this study, we attempt to overcome the generalization limitation by deriving a metric which reliably identifies adversarial images even when the approach taken by the adversary is unknown. Our metric leverages key differences between the spectra of clean and adversarial images when an image is treated as a matrix. Our metric is able to detect adversarial images across different datasets and attack strategies without any additional re-calibration. In addition, our approach provides geometric insights into several unanswered questions about adversarial perturbations.

LGApr 28, 2017
On weight initialization in deep neural networks

Siddharth Krishna Kumar

A proper initialization of the weights in a neural network is critical to its convergence. Current insights into weight initialization come primarily from linear activation functions. In this paper, I develop a theory for weight initializations with non-linear activations. First, I derive a general weight initialization strategy for any neural network using activation functions differentiable at 0. Next, I derive the weight initialization strategy for the Rectified Linear Unit (RELU), and provide theoretical insights into why the Xavier initialization is a poor choice with RELU activations. My analysis provides a clear demonstration of the role of non-linearities in determining the proper weight initializations.