Mahesh Godavarti

LG
h-index1
4papers
6citations
Novelty44%
AI Score40

4 Papers

LGJun 10, 2025Code
JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset

Mahesh Godavarti

Transformers have demonstrated remarkable success in sequence modeling, yet effectively incorporating positional information remains a challenging and active area of research. In this paper, we introduce JoFormer, a journey-based Transformer architecture grounded in a recently proposed non-commutative algebra for composing transformations across positions. JoFormer represents relative positions through learnable directional transforms that are sequentially composed along the input, thereby extending and generalizing existing approaches based on relative position representations. We derive the JoFormer attention mechanism from first principles and show that it subsumes standard methods such as rotary transformations as special cases. To evaluate its effectiveness, we compare JoFormer to the RoFormer baseline on the Tiny Shakespeare character-level language modeling task. Our results demonstrate that JoFormer consistently achieves lower perplexity and faster convergence, highlighting the advantages of its more expressive, journey-based treatment of position. Notably, the per-token JoFormer is still a primitive, conceptual variant with layer-independent angles, yet it already demonstrates strong performance-underscoring its promise as a proof of concept for more expressive architectures. We conclude by discussing how JoFormer offers a principled approach to integrating positional structure into Transformer architectures. The code used in this work is available at https://github.com/mahesh-godavarti/joformer.

LGJun 4, 2025Code
Directional Non-Commutative Monoidal Embeddings for MNIST

Mahesh Godavarti

We present an empirical validation of the directional non-commutative monoidal embedding framework recently introduced in prior work~\cite{Godavarti2025monoidal}. This framework defines learnable compositional embeddings using distinct non-commutative operators per dimension (axis) that satisfy an interchange law, generalizing classical one-dimensional transforms. Our primary goal is to verify that this framework can effectively model real data by applying it to a controlled, well-understood task: image classification on the MNIST dataset~\cite{lecun1998gradient}. A central hypothesis for why the proposed monoidal embedding works well is that it generalizes the Discrete Fourier Transform (DFT)~\cite{oppenheim1999discrete} by learning task-specific frequency components instead of using fixed basis frequencies. We test this hypothesis by comparing learned monoidal embeddings against fixed DFT-based embeddings on MNIST. The results show that as the embedding dimensionality decreases (e.g., from 32 to 8 to 2), the performance gap between the learned monoidal embeddings and fixed DFT-based embeddings on MNIST grows increasingly large. This comparison is used as an analytic tool to explain why the framework performs well: the learnable embeddings can capture the most discriminative spectral components for the task. Overall, our experiments confirm that directional non-commutative monoidal embeddings are highly effective for representing image data, offering a compact learned representation that retains high task performance. The code used in this work is available at https://github.com/mahesh-godavarti/directional_composition_mnist.

LGMay 21, 2025
Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning

Mahesh Godavarti

We introduce a new algebraic structure for multi-dimensional compositional embeddings, built on directional non-commutative monoidal operators. The core contribution of this work is this novel framework, which exhibits appealing theoretical properties (associativity along each dimension and an interchange law ensuring global consistency) while remaining compatible with modern machine learning architectures. Our construction defines a distinct composition operator circ_i for each axis i, ensuring associative combination along each axis without imposing global commutativity. Importantly, all axis-specific operators commute with one another, enforcing a global interchange law that enables consistent crossaxis compositions. This is, to our knowledge, the first approach that provides a common foundation that generalizes classical sequence-modeling paradigms (e.g., structured state-space models (SSMs) and transformer self-attention) to a unified multi-dimensional framework. For example, specific one-dimensional instances of our framework can recover the familiar affine transformation algebra, vanilla self-attention, and the SSM-style recurrence. The higher-dimensional generalizations naturally support recursive, structure-aware operations in embedding spaces. We outline several potential applications unlocked by this structure-including structured positional encodings in Transformers, directional image embeddings, and symbolic modeling of sequences or grids-indicating that it could inform future deep learning model designs. We formally establish the algebraic properties of our framework and discuss efficient implementations. Finally, as our focus is theoretical, we include no experiments here and defer empirical validation to future work, which we plan to undertake.

LGMay 30, 2025
Directional Non-Commutative Monoidal Structures with Interchange Law via Commutative Generators

Mahesh Godavarti

We introduce a novel framework consisting of a class of algebraic structures that generalize one-dimensional monoidal systems into higher dimensions by defining per-axis composition operators subject to non-commutativity and a global interchange law. These structures, defined recursively from a base case of vector-matrix pairs, model directional composition in multiple dimensions while preserving structural coherence through commutative linear operators. We show that the framework that unifies several well-known linear transforms in signal processing and data analysis. In this framework, data indices are embedded into a composite structure that decomposes into simpler components. We show that classic transforms such as the Discrete Fourier Transform (DFT), the Walsh transform, and the Hadamard transform are special cases of our algebraic structure. The framework provides a systematic way to derive these transforms by appropriately choosing vector and matrix pairs. By subsuming classical transforms within a common structure, the framework also enables the development of learnable transformations tailored to specific data modalities and tasks.