CLNov 28, 2022
GPT-Neo for commonsense reasoning -- a theoretical and practical lensRohan Kashyap, Vivek Kashyap, Narendra C. P.
Recent work has demonstrated substantial gains in pre-training large-language models (LLMs) followed by supervised fine-tuning on the downstream task. In this paper, we evaluate the performance of the GPT-neo model using $6$ commonsense reasoning benchmark tasks. We aim to examine the performance of smaller models using the GPT-neo models against several larger model baselines such as GPT-$3$, Llama-$2$, MPT and Falcon. Upon fine-tuning with the appropriate set of hyperparameters, our model achieves competitive accuracy on several tasks. We also investigate and substantiate our results using attention-head visualization to better understand the model performance. Finally, we conduct various robustness tests using various methods to gauge the model performance under numerous settings.
LGNov 28, 2022
A survey of deep learning optimizers -- first and second order methodsRohan Kashyap
Deep Learning optimization involves minimizing a high-dimensional loss function in the weight space which is often perceived as difficult due to its inherent difficulties such as saddle points, local minima, ill-conditioning of the Hessian and limited compute resources. In this paper, we provide a comprehensive review of $14$ standard optimization methods successfully used in deep learning research and a theoretical assessment of the difficulties in numerical optimization from the optimization literature.
LGSep 6, 2023
A Unified Framework for Discovering Discrete SymmetriesPavan Karjol, Rohan Kashyap, Aditya Gopalan et al.
We consider the problem of learning a function respecting a symmetry from among a class of symmetries. We develop a unified framework that enables symmetry discovery across a broad range of subgroups including locally symmetric, dihedral and cyclic subgroups. At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions invariant to these subgroups in a principled manner. The structure of the architecture enables us to leverage multi-armed bandit algorithms and gradient descent to efficiently optimize over the linear and the non-linear functions, respectively, and to infer the symmetry that is ultimately learnt. We also discuss the necessity of the matrix-valued functions in the architecture. Experiments on image-digit sum and polynomial regression tasks demonstrate the effectiveness of our approach.
LGSep 11, 2023
Neural Discovery of Permutation SubgroupsPavan Karjol, Rohan Kashyap, Prathosh A P
We consider the problem of discovering subgroup $H$ of permutation group $S_{n}$. Unlike the traditional $H$-invariant networks wherein $H$ is assumed to be known, we present a method to discover the underlying subgroup, given that it satisfies certain conditions. Our results show that one could discover any subgroup of type $S_{k} (k \leq n)$ by learning an $S_{n}$-invariant function and a linear transformation. We also prove similar results for cyclic and dihedral subgroups. Finally, we provide a general theorem that can be extended to discover other subgroups of $S_{n}$. We also demonstrate the applicability of our results through numerical experiments on image-digit sum and symmetric polynomial regression tasks.
LGSep 26, 2025
Automatic Discovery of One-Parameter Subgroups of Lie Groups: Compact and Non-Compact Cases of $\mathbf{SO(n)}$ and $\mathbf{SL(n)}$Pavan Karjol, Vivek V Kashyap, Rohan Kashyap et al.
We introduce a novel framework for the automatic discovery of one-parameter subgroups ($H_γ$) of $SO(3)$ and, more generally, $SO(n)$. One-parameter subgroups of $SO(n)$ are crucial in a wide range of applications, including robotics, quantum mechanics, and molecular structure analysis. Our method utilizes the standard Jordan form of skew-symmetric matrices, which define the Lie algebra of $SO(n)$, to establish a canonical form for orbits under the action of $H_γ$. This canonical form is then employed to derive a standardized representation for $H_γ$-invariant functions. By learning the appropriate parameters, the framework uncovers the underlying one-parameter subgroup $H_γ$. The effectiveness of the proposed approach is demonstrated through tasks such as double pendulum modeling, moment of inertia prediction, top quark tagging and invariant polynomial regression, where it successfully recovers meaningful subgroup structure and produces interpretable, symmetry-aware representations.
LGSep 26, 2025
Learning Equivariant Functions via Quadratic FormsPavan Karjol, Vivek V Kashyap, Rohan Kashyap et al.
In this study, we introduce a method for learning group (known or unknown) equivariant functions by learning the associated quadratic form $x^T A x$ corresponding to the group from the data. Certain groups, known as orthogonal groups, preserve a specific quadratic form, and we leverage this property to uncover the underlying symmetry group under the assumption that it is orthogonal. By utilizing the corresponding unique symmetric matrix and its inherent diagonal form, we incorporate suitable inductive biases into the neural network architecture, leading to models that are both simplified and efficient. Our approach results in an invariant model that preserves norms, while the equivariant model is represented as a product of a norm-invariant model and a scale-invariant model, where the ``product'' refers to the group action. Moreover, we extend our framework to a more general setting where the function acts on tuples of input vectors via a diagonal (or product) group action. In this extension, the equivariant function is decomposed into an angular component extracted solely from the normalized first vector and a scale-invariant component that depends on the full Gram matrix of the tuple. This decomposition captures the inter-dependencies between multiple inputs while preserving the underlying group symmetry. We assess the effectiveness of our framework across multiple tasks, including polynomial regression, top quark tagging, and moment of inertia matrix prediction. Comparative analysis with baseline methods demonstrates that our model consistently excels in both discovering the underlying symmetry and efficiently learning the corresponding equivariant function.