Rishabh Sabharwal

h-index2
2papers

2 Papers

LGSep 15, 2023
A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings

Alokendu Mazumder, Rishabh Sabharwal, Manan Tayal et al.

In neural network training, RMSProp and Adam remain widely favoured optimisation algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyse a constant step size version of Adam in the non-convex setting and discuss why it is important for the convergence of Adam to use a fixed step size. This work demonstrates the derivation and effective implementation of a constant step size for Adam, offering insights into its performance and efficiency in non convex optimisation scenarios. (i) First, we provide proof that these adaptive gradient algorithms are guaranteed to reach criticality for smooth non-convex objectives with constant step size, and we give bounds on the running time. Both deterministic and stochastic versions of Adam are analysed in this paper. We show sufficient conditions for the derived constant step size to achieve asymptotic convergence of the gradients to zero with minimal assumptions. Next, (ii) we design experiments to empirically study Adam's convergence with our proposed constant step size against stateof the art step size schedulers on classification tasks. Lastly, (iii) we also demonstrate that our derived constant step size has better abilities in reducing the gradient norms, and empirically, we show that despite the accumulation of a few past gradients, the key driver for convergence in Adam is the non-increasing step sizes.

CVDec 12, 2024
STEAM: Squeeze and Transform Enhanced Attention Module

Rishabh Sabharwal, Ram Samarth B B, Parikshit Singh Rathore et al.

Channel and spatial attention mechanisms introduced by earlier works enhance the representation abilities of deep convolutional neural networks (CNNs) but often lead to increased parameter and computation costs. While recent approaches focus solely on efficient feature context modeling for channel attention, we aim to model both channel and spatial attention comprehensively with minimal parameters and reduced computation. Leveraging the principles of relational modeling in graphs, we introduce a constant-parameter module, STEAM: Squeeze and Transform Enhanced Attention Module, which integrates channel and spatial attention to enhance the representation power of CNNs. To our knowledge, we are the first to propose a graph-based approach for modeling both channel and spatial attention, utilizing concepts from multi-head graph transformers. Additionally, we introduce Output Guided Pooling (OGP), which efficiently captures spatial context to further enhance spatial attention. We extensively evaluate STEAM for large-scale image classification, object detection and instance segmentation on standard benchmark datasets. STEAM achieves a 2% increase in accuracy over the standard ResNet-50 model with only a meager increase in GFLOPs. Furthermore, STEAM outperforms leading modules ECA and GCT in terms of accuracy while achieving a three-fold reduction in GFLOPs.