Suyash Gupta

ML
h-index66
12papers
388citations
Novelty58%
AI Score49

12 Papers

69.1LGApr 7
Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo

Jelena Markovic-Voronov, Wenhui Zhu, Bo Long et al.

We introduce a principled probabilistic framework for reward-guided decoding in large language models, addressing the limitations of standard decoding methods that optimize token-level likelihood rather than sequence-level quality. Our method defines a reward-augmented target distribution over complete sequences by combining model transition probabilities with prefix-dependent reward potentials. Importantly, the approach is training-free: it leaves model weights unchanged and instead modifies the inference distribution via reward potentials, with all gains arising purely from inference-time sampling. To sample from this distribution, we develop Sequential Monte Carlo algorithms, including a computationally efficient prefix-only variant and a lookahead variant whose intermediate targets match the exact marginals of the full sequence distribution. The framework also integrates resample-move updates with Metropolis-Hastings rejuvenation and supports block-wise generation, subsuming common decoding strategies such as temperature sampling and power-tempered objectives. Empirical results across three 7B models show significant gains. On code generation (HumanEval), our method improves base performance by up to 54.9% and surpasses the strongest sampling baselines by 9.1%-15.3%. On mathematical reasoning (MATH500), it achieves gains of up to 8.8%. Notably, it reaches 87.8% on HumanEval and 78.4% on MATH500 with Qwen2.5-7B, consistently outperforming the reinforcement learning method GRPO.

DCFeb 24
Heterogeneity-Aware Client Selection Methodology For Efficient Federated Learning

Nihal Balivada, Shrey Gupta, Shashank Shreedhar Bhatt et al.

Federated Learning (FL) enables a distributed client-server architecture where multiple clients collaboratively train a global Machine Learning (ML) model without sharing sensitive local data. However, FL often results in lower accuracy than traditional ML algorithms due to statistical heterogeneity across clients. Prior works attempt to address this by using model updates, such as loss and bias, from client models to select participants that can improve the global model's accuracy. However, these updates neither accurately represent a client's heterogeneity nor are their selection methods deterministic. We mitigate these limitations by introducing Terraform, a novel client selection methodology that uses gradient updates and a deterministic selection algorithm to select heterogeneous clients for retraining. This bi-pronged approach allows Terraform to achieve up to 47 percent higher accuracy over prior works. We further demonstrate its efficiency through comprehensive ablation studies and training time analyses, providing strong justification for the robustness of Terraform.

LGFeb 25
Support Tokens, Stability Margins, and a New Foundation for Robust LLMs

Deepak Agarwal, Dhyey Dharmendrakumar Mavani, Suyash Gupta et al.

Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We re-interpret causal self-attention transformers, the backbone of modern foundation models, within a probabilistic framework, much like how classical PCA is extended to probabilistic PCA. However, this re-formulation reveals a surprising and deeper structural insight: due to a change-of-variables phenomenon, a barrier constraint emerges on the self-attention parameters. This induces a highly structured geometry on the token space, providing theoretical insights into the dynamics of LLM decoding. This reveals a boundary where attention becomes ill-conditioned, leading to a margin interpretation similar to classical support vector machines. Just like support vectors, this naturally gives rise to the concept of support tokens. Furthermore, we show that LLMs can be interpreted as a stochastic process over the power set of the token space, providing a rigorous probabilistic framework for sequence modeling. We propose a Bayesian framework and derive a MAP estimation objective that requires only a minimal modification to standard LLM training: the addition of a smooth log-barrier penalty to the usual cross-entropy loss. We demonstrate that this provides more robust models without sacrificing out-of-sample accuracy and that it is straightforward to incorporate in practice.

MLMar 25, 2024
Predictive Inference in Multi-environment Scenarios

John C. Duchi, Suyash Gupta, Kuanhao Jiang et al.

We address the challenge of constructing valid confidence intervals and sets in problems of prediction across multiple environments. We investigate two types of coverage suitable for these problems, extending the jackknife and split-conformal methods to show how to obtain distribution-free coverage in such non-traditional, potentially hierarchical data-generating scenarios. We demonstrate a novel resizing method to adapt to problem difficulty, which applies both to existing approaches for predictive inference and the methods we develop; this reduces prediction set sizes using limited information from the test environment, a key to the methods' practical performance, which we evaluate through neurochemical sensing and species classification datasets. Our contributions also include extensions for settings with non-real-valued responses, a theory of consistency for predictive inference in these general problems, and insights on the limits of conditional coverage.

LGApr 10, 2024
Spatial Transfer Learning for Estimating PM2.5 in Data-poor Regions

Shrey Gupta, Yongbee Park, Jianzhao Bi et al.

Air pollution, especially particulate matter 2.5 (PM2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning methodologies do not account for dependencies between the source and the target domains. We recognize this transfer problem as spatial transfer learning and propose a new feature named Latent Dependency Factor (LDF) that captures spatial and semantic dependencies of both domains and is subsequently added to the feature spaces of the domains. We generate LDF using a novel two-stage autoencoder model that learns from clusters of similar source and target domain data. Our experiments show that transfer learning models using LDF have a 19.34% improvement over the baselines. We additionally support our experiments with qualitative findings.

DBFeb 3, 2022
Dissecting BFT Consensus: In Trusted Components we Trust!

Suyash Gupta, Sajjad Rahnama, Shubham Pandey et al.

The growing interest in reliable multi-party applications has fostered widespread adoption of Byzantine Fault-Tolerant (BFT) consensus protocols. Existing BFT protocols need f more replicas than Paxos-style protocols to prevent equivocation attacks. Trust-BFT protocols instead seek to minimize this cost by making use of trusted components at replicas. This paper makes two contributions. First, we analyze the design of existing Trust-BFT protocols and uncover three fundamental limitations that preclude most practical deployments. Some of these limitations are fundamental, while others are linked to the state of trusted components today. Second, we introduce a novel suite of consensus protocols, FlexiTrust, that attempts to sidestep these issues. We show that our FlexiTrust protocols achieve up to 185% more throughput than their Trust-BFT counterparts.

MLJan 20, 2022
Predictive Inference with Weak Supervision

Maxime Cauchois, Suyash Gupta, Alnur Ali et al.

The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive, though it is not always apparent how to leverage such data for model fitting or validation. We present a methodology to bridge the gap between partial supervision and validation, developing a conformal prediction framework to provide valid predictive confidence sets -- sets that cover a true label with a prescribed probability, independent of the underlying distribution -- using weakly labeled data. To do so, we introduce a (necessary) new notion of coverage and predictive validity, then develop several application scenarios, providing efficient algorithms for classification and several large-scale structured prediction problems. We corroborate the hypothesis that the new coverage definition allows for tighter and more informative (but valid) confidence sets through several experiments.

DBJul 27, 2021
RingBFT: Resilient Consensus over Sharded Ring Topology

Sajjad Rahnama, Suyash Gupta, Rohan Sogani et al.

The recent surge in federated data management applications has brought forth concerns about the security of underlying data and the consistency of replicas in the presence of malicious attacks. A prominent solution in this direction is to employ a permissioned blockchain framework that is modeled around traditional Byzantine Fault-Tolerant (BFT) consensus protocols. Any federated application expects its data to be globally scattered to achieve faster access. But, prior works have shown that traditional BFT protocols are slow. This has led to the rise of sharded-replicated blockchains. Existing BFT protocols for these sharded blockchains are efficient if client transactions require access to a single-shard, but face performance degradation if there is a cross-shard transaction that requires access to multiple shards. As cross-shard transactions are common, to resolve this dilemma, we present RingBFT, a novel meta-BFT protocol for sharded blockchains. RingBFT requires shards to adhere to the ring order, and follow the principle of process, forward, and re-transmit while ensuring the communication between shards is linear. Our evaluation of RingBFT against state-of-the-art sharding BFT protocols illustrates that RingBFT achieves up to 18x higher throughput, gracefully scales to nearly 500 globally distributed nodes, and achieves a peak throughput of 1.2 million transactions per second.

DBJul 24, 2021
Blockchain Transaction Processing

Suyash Gupta, Mohammad Sadoghi

A blockchain is an append-only linked-list of blocks, which is maintained at each participating node. Each block records a set of transactions and their associated metadata. Blockchain transactions act on the identical ledger data stored at each node. Blockchain was first perceived by Satoshi Nakamoto as a peer-to-peer digital-commodity (also known as crypto-currency) exchange system. Blockchains received traction due to their inherent property of immutability-once a block is accepted, it cannot be reverted.

MEMay 7, 2021
The $s$-value: evaluating stability with respect to distributional shifts

Suyash Gupta, Dominik Rothenhäusler

Common statistical measures of uncertainty such as $p$-values and confidence intervals quantify the uncertainty due to sampling, that is, the uncertainty due to not observing the full population. However, sampling is not the only source of uncertainty. In practice, distributions change between locations and across time. This makes it difficult to gather knowledge that transfers across data sets. We propose a measure of instability that quantifies the distributional instability of a statistical parameter with respect to Kullback-Leibler divergence, that is, the sensitivity of the parameter under general distributional perturbations within a Kullback-Leibler divergence ball. In addition, we quantify the instability of parameters with respect to directional or variable-specific shifts. Measuring instability with respect to directional shifts can be used to detect the type of shifts a parameter is sensitive to. We discuss how such knowledge can inform data collection for improved estimation of statistical parameters under shifted distributions. We evaluate the performance of the proposed measure on real data and show that it can elucidate the distributional instability of a parameter with respect to certain shifts and can be used to improve estimation accuracy under shifted distributions.

MLAug 10, 2020
Robust Validation: Confident Predictions Even When Distributions Shift

Maxime Cauchois, Suyash Gupta, Alnur Ali et al.

While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy -- coming from robust statistics and optimization -- is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several large-scale benchmark datasets, including Recht et al.'s CIFAR-v4 and ImageNet-V2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity.

MLApr 21, 2020
Knowing what you know: valid and validated confidence sets in multiclass and multilabel prediction

Maxime Cauchois, Suyash Gupta, John Duchi

We develop conformal prediction methods for constructing valid predictive confidence sets in multiclass and multilabel problems without assumptions on the data generating distribution. A challenge here is that typical conformal prediction methods---which give marginal validity (coverage) guarantees---provide uneven coverage, in that they address easy examples at the expense of essentially ignoring difficult examples. By leveraging ideas from quantile regression, we build methods that always guarantee correct coverage but additionally provide (asymptotically optimal) conditional coverage for both multiclass and multilabel prediction problems. To address the potential challenge of exponentially large confidence sets in multilabel prediction, we build tree-structured classifiers that efficiently account for interactions between labels. Our methods can be bolted on top of any classification model---neural network, random forest, boosted tree---to guarantee its validity. We also provide an empirical evaluation, simultaneously providing new validation methods, that suggests the more robust coverage of our confidence sets.