Shashank Sharma

CV
h-index3
6papers
20citations
Novelty40%
AI Score36

6 Papers

32.6AIApr 21
OLLM: Options-based Large Language Models

Shashank Sharma, Janina Hoffmann, Vinay Namboodiri

We introduce Options LLM (OLLM), a simple, general method that replaces the single next-token prediction of standard LLMs with a \textit{set of learned options} for the next token, indexed by a discrete latent variable. Instead of relying on temperature or sampling heuristics to induce diversity, OLLM models variation explicitly: a small latent space parametrizes multiple plausible next-token options which can be selected or searched by a downstream policy. Architecturally, OLLM is a lightweight "plug-in" that inserts two layers: an encoder and a decoder, before the output head, allowing almost any pretrained LLM to be converted with minimal additional parameters. We apply OLLM to a 1.7B-parameter backbone (only $1.56\%$ of parameters trainable) trained on OpenMathReasoning and evaluated on OmniMath. The SOTA LoRA-adapted baselines peak at $51\%$ final answer correctness, while OLLM's option set allows up to $\sim 70\%$ under optimal latent selection. We then train a compact policy in the latent space that emits latents to control generation. Operating in a low-dimensional option space makes reward optimization far more sample-efficient and substantially reduces common misalignments (e.g., language switching or degenerate reasoning), as the policy is constrained to options learned during SFT. Crucially, this alignment arises from model structure rather than additional KL or handcrafted alignment losses. Our results demonstrate that optionized next-token modeling enhances controllability, robustness, and efficiency in math reasoning, and highlight latent-space policy learning as a promising direction for reinforcement learning in LLMs.

CVJul 3, 2024
Advanced Smart City Monitoring: Real-Time Identification of Indian Citizen Attributes

Shubham Kale, Shashank Sharma, Abhilash Khuntia

This project focuses on creating a smart surveillance system for Indian cities that can identify and analyze people's attributes in real time. Using advanced technologies like artificial intelligence and machine learning, the system can recognize attributes such as upper body color, what the person is wearing, accessories they are wearing, headgear, etc., and analyze behavior through cameras installed around the city.

AIMay 27, 2025
MRSD: Multi-Resolution Skill Discovery for HRL Agents

Shashank Sharma, Janina Hoffmann, Vinay Namboodiri

Hierarchical reinforcement learning (HRL) relies on abstract skills to solve long-horizon tasks efficiently. While existing skill discovery methods learns these skills automatically, they are limited to a single skill per task. In contrast, humans learn and use both fine-grained and coarse motor skills simultaneously. Inspired by human motor control, we propose Multi-Resolution Skill Discovery (MRSD), an HRL framework that learns multiple skill encoders at different temporal resolutions in parallel. A high-level manager dynamically selects among these skills, enabling adaptive control strategies over time. We evaluate MRSD on tasks from the DeepMind Control Suite and show that it outperforms prior state-of-the-art skill discovery and HRL methods, achieving faster convergence and higher final performance. Our findings highlight the benefits of integrating multi-resolution skills in HRL, paving the way for more versatile and efficient agents.

ROFeb 4, 2025
DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents

Shashank Sharma, Janina Hoffmann, Vinay Namboodiri

Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goals into sequences of simpler subtasks, using a novel advantage estimation strategy that inherently rewards shorter plans and generalizes beyond training depths. In addition, to address the data efficiency challenge, we introduce an exploration strategy that generates targeted training examples for the planning modules without needing expert data. Experiments in 25-room navigation environments demonstrate $100\%$ success rate (vs $82\%$ baseline) and $73$-step average episode length (vs $158$-step baseline). The method also generalizes to momentum-based control tasks and requires only $\log N$ steps for replanning. Theoretical analysis and ablations validate our design choices.

CVNov 19, 2021
Evaluating Self and Semi-Supervised Methods for Remote Sensing Segmentation Tasks

Chaitanya Patel, Shashank Sharma, Valerie J. Pasquarella et al.

Self- and semi-supervised machine learning techniques leverage unlabeled data for improving downstream task performance. These methods are especially valuable for remote sensing tasks where producing labeled ground truth datasets can be prohibitively expensive but there is easy access to a wealth of unlabeled imagery. We perform a rigorous evaluation of SimCLR, a self-supervised method, and FixMatch, a semi-supervised method, on three remote sensing tasks: riverbed segmentation, land cover mapping, and flood mapping. We quantify performance improvements on these remote sensing segmentation tasks when additional imagery outside of the original supervised dataset is made available for training. We also design experiments to test the effectiveness of these techniques when the test set is domain shifted to sample different geographic areas compared to the training and validation sets. We find that such techniques significantly improve generalization performance when labeled data is limited and there are geographic domain shifts between the training data and the validation/test data.

CVFeb 2, 2018
No Modes left behind: Capturing the data distribution effectively using GANs

Shashank Sharma, Vinay P. Namboodiri

Generative adversarial networks (GANs) while being very versatile in realistic image synthesis, still are sensitive to the input distribution. Given a set of data that has an imbalance in the distribution, the networks are susceptible to missing modes and not capturing the data distribution. While various methods have been tried to improve training of GANs, these have not addressed the challenges of covering the full data distribution. Specifically, a generator is not penalized for missing a mode. We show that these are therefore still susceptible to not capturing the full data distribution. In this paper, we propose a simple approach that combines an encoder based objective with novel loss functions for generator and discriminator that improves the solution in terms of capturing missing modes. We validate that the proposed method results in substantial improvements through its detailed analysis on toy and real datasets. The quantitative and qualitative results demonstrate that the proposed method improves the solution for the problem of missing modes and improves training of GANs.