Saksham Bassi

h-index3

4papers

43citations

Novelty43%

AI Score29

Ranked #142,132 of 194,257 authors (top 73%)#25,343 in CL (top 82%)

4 Papers

12.2LGJul 5

Geometry of Ordinal Representations in Language Models

Saksham Bassi, Sharvi Tomar

Recent work showed that language models represent character counts on curved 1D manifolds, with attention heads performing geometric transformations to enable computation. We test whether this generalizes across four ordinal tasks (bracket depth, indentation, table position, numeric magnitude) in Gemma-2-2B, Gemma-2-9B, and Qwen3-4B. We find that 1D manifolds with place-cell feature tiling emerge for tasks where the ordinal variable is locally computable from token identity, while tasks requiring cross-position integration or semantic extraction produce higher-dimensional or incoherent representations. Geometric computation is architecture-dependent: Qwen3-4B shows substantially stronger twisting than Gemma models for indentation, and its twisters preserve ordinal order, unlike its numeric twisters. Activation patching confirms that the identified manifold subspaces concentrate task-relevant information, with manifold-direction ablation causing dramatically larger probe accuracy drops than random-direction controls.

2.0LGMar 1, 2023Code

Learning high-dimensional causal effect

Aayush Agarwal, Saksham Bassi

The scarcity of high-dimensional causal inference datasets restricts the exploration of complex deep models. In this work, we propose a method to generate a synthetic causal dataset that is high-dimensional. The synthetic data simulates a causal effect using the MNIST dataset with Bernoulli treatment values. This provides an opportunity to study varieties of models for causal effect estimation. We experiment on this dataset using Dragonnet architecture (Shi et al. (2019)) and modified architectures. We use the modified architectures to explore different types of initial Neural Network layers and observe that the modified architectures perform better in estimations. We observe that residual and transformer models estimate treatment effect very closely without the need for targeted regularization, introduced by Shi et al. (2019).

1.2ASSep 8, 2023Code

End-to-End Speech Recognition and Disfluency Removal with Acoustic Language Model Pretraining

Saksham Bassi, Giulio Duregon, Siddhartha Jalagam et al.

The SOTA in transcription of disfluent and conversational speech has in recent years favored two-stage models, with separate transcription and cleaning stages. We believe that previous attempts at end-to-end disfluency removal have fallen short because of the representational advantage that large-scale language model pretraining has given to lexical models. Until recently, the high dimensionality and limited availability of large audio datasets inhibited the development of large-scale self-supervised pretraining objectives for learning effective audio representations, giving a relative advantage to the two-stage approach, which utilises pretrained representations for lexical tokens. In light of recent successes in large scale audio pretraining, we revisit the performance comparison between two-stage and end-to-end model and find that audio based language models pretrained using weak self-supervised objectives match or exceed the performance of similarly trained two-stage models, and further, that the choice of pretraining objective substantially effects a model's ability to be adapted to the disfluency removal task.

12.6CLApr 24, 2024

Generalization Measures for Zero-Shot Cross-Lingual Transfer

Saksham Bassi, Duygu Ataman, Kyunghyun Cho

A model's capacity to generalize its knowledge to interpret unseen inputs with different characteristics is crucial to build robust and reliable machine learning systems. Language model evaluation tasks lack information metrics about model generalization and their applicability in a new setting is measured using task and language-specific downstream performance, which is often lacking in many languages and tasks. In this paper, we explore a set of efficient and reliable measures that could aid in computing more information related to the generalization capability of language models in cross-lingual zero-shot settings. In addition to traditional measures such as variance in parameters after training and distance from initialization, we also measure the effectiveness of sharpness in loss landscape in capturing the success in cross-lingual transfer and propose a novel and stable algorithm to reliably compute the sharpness of a model optimum that correlates to generalization.