Aranyak Mehta

h-index26

8papers

4,204citations

Novelty61%

AI Score47

Ranked #31,980 of 194,257 authors (top 16%)#62 in GT (top 17%)

8 Papers

8.9GTJul 16

Compensation Design

Ioannis Anagnostides, Kshipra Bhawalkar, Christopher Liaw et al.

We introduce compensation design, the problem of designing payment rules that incentivize high-quality contributions in decentralized environments. Here, a budget-constrained principal with a monotone submodular value function aims to design a payment rule, while agents decide whether to opt in or out depending on their private cost. We show that a simple cost-oblivious and anonymous marginal-contribution payment rule guarantees that pure Nash equilibria always exist and attain a price of anarchy (PoA) of at most $2+o_λ(1)$ in the large-market regime ($λ\to 0$) where each individual cost is at most a $λ$ fraction of the budget. We further show that the factor $2$ is unavoidable among deterministic cost-oblivious rules. Surprisingly, we identify a counterexample showing that a payment rule based on the Shapley value may admit no pure Nash equilibria. We then extend our scope to coarse correlated equilibria. This is further motivated by our intractability result: although a pure Nash equilibrium always exists, computing one is PLS-complete. We establish that coarse correlated equilibria also attain a PoA bound of at most $2+o_λ(1)$, and this guarantee in fact extends even under the payment rule induced by the Shapley value. Moreover, we move beyond monotone submodular value functions and binary actions. First, for (monotone) XOS valuations, we show that no oracle-efficient payment rule can attain a PoA bound of $O(n^{1/2 - ε})$. Second, for submodular but non-monotone valuations, we show that a broad class of natural payment rules fails to guarantee a bounded PoA. Finally, we extend compensation design to the setting where each agent has a combinatorial action set. We provide randomized payment rules with logarithmic PoA guarantees for subadditive values, and matching lower bounds that apply even in the single-agent additive-value setting.

31.2LGJul 22, 2024

Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Kaiwen Wang, Rahul Kidambi, Ryan Sullivan et al.

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate the existing approaches for multi-objective finetuning.

2.3GTJul 10, 2024Code

Deep Reinforcement Learning for Sequential Combinatorial Auctions

Sai Srivatsa Ravindranath, Zhe Feng, Di Wang et al.

Revenue-optimal auction design is a challenging problem with significant theoretical and practical implications. Sequential auction mechanisms, known for their simplicity and strong strategyproofness guarantees, are often limited by theoretical results that are largely existential, except for certain restrictive settings. Although traditional reinforcement learning methods such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) are applicable in this domain, they struggle with computational demands and convergence issues when dealing with large and continuous action spaces. In light of this and recognizing that we can model transitions differentiable for our settings, we propose using a new reinforcement learning framework tailored for sequential combinatorial auctions that leverages first-order gradients. Our extensive evaluations show that our approach achieves significant improvement in revenue over both analytical baselines and standard reinforcement learning algorithms. Furthermore, we scale our approach to scenarios involving up to 50 agents and 50 items, demonstrating its applicability in complex, real-world auction settings. As such, this work advances the computational tools available for auction design and contributes to bridging the gap between theoretical results and practical implementations in sequential auction design.

1.2GTFeb 16, 2023

User Response in Ad Auctions: An MDP Formulation of Long-Term Revenue Optimization

Yang Cai, Zhe Feng, Christopher Liaw et al.

We propose a new Markov Decision Process (MDP) model for ad auctions to capture the user response to the quality of ads, with the objective of maximizing the long-term discounted revenue. By incorporating user response, our model takes into consideration all three parties involved in the auction (advertiser, auctioneer, and user). The state of the user is modeled as a user-specific click-through rate (CTR) with the CTR changing in the next round according to the set of ads shown to the user in the current round. We characterize the optimal mechanism for this MDP as a Myerson's auction with a notion of modified virtual value, which relies on the value distribution of the advertiser, the current user state, and the future impact of showing the ad to the user. Leveraging this characterization, we design a sample-efficient and computationally-efficient algorithm which outputs an approximately optimal policy that requires only sample access to the true MDP and the value distributions of the bidders. Finally, we propose a simple mechanism built upon second price auctions with personalized reserve prices and show it can achieve a constant-factor approximation to the optimal long term discounted revenue.

1.4LGFeb 3

Inference-time Unlearning Using Conformal Prediction

Somnath Basu Roy Chowdhury, Rahul Kidambi, Avinava Dubey et al.

Machine unlearning is the process of efficiently removing specific information from a trained machine learning model without retraining from scratch. Existing unlearning methods, which often provide provable guarantees, typically involve retraining a subset of model parameters based on a forget set. While these approaches show promise in certain scenarios, their underlying assumptions are often challenged in real-world applications -- particularly when applied to generative models. Furthermore, updating parameters using these unlearning procedures often degrades the general-purpose capabilities the model acquired during pre-training. Motivated by these shortcomings, this paper considers the paradigm of inference time unlearning -- wherein, the generative model is equipped with an (approximately correct) verifier that judges whether the model's response satisfies appropriate unlearning guarantees. This paper introduces a framework that iteratively refines the quality of the generated responses using feedback from the verifier without updating the model parameters. The proposed framework leverages conformal prediction to reduce computational overhead and provide distribution-free unlearning guarantees. This paper's approach significantly outperforms existing state-of-the-art methods, reducing unlearning error by up to 93% across challenging unlearning benchmarks.

15.8GTApr 11, 2024

Auctions with LLM Summaries

Kumar Avinava Dubey, Zhe Feng, Rahul Kidambi et al.

We study an auction setting in which bidders bid for placement of their content within a summary generated by a large language model (LLM), e.g., an ad auction in which the display is a summary paragraph of multiple ads. This generalizes the classic ad settings such as position auctions to an LLM generated setting, which allows us to handle general display formats. We propose a novel factorized framework in which an auction module and an LLM module work together via a prediction model to provide welfare maximizing summary outputs in an incentive compatible manner. We provide a theoretical analysis of this framework and synthetic experiments to demonstrate the feasibility and validity of the system together with welfare comparisons.

4.9CLSep 29, 2025

Incentive-Aligned Multi-Source LLM Summaries

Yanchen Jiang, Zhe Feng, Aranyak Mehta

Large language models (LLMs) are increasingly used in modern search and answer systems to synthesize multiple, sometimes conflicting, texts into a single response, yet current pipelines offer weak incentives for sources to be accurate and are vulnerable to adversarial content. We introduce Truthful Text Summarization (TTS), an incentive-aligned framework that improves factual robustness without ground-truth labels. TTS (i) decomposes a draft synthesis into atomic claims, (ii) elicits each source's stance on every claim, (iii) scores sources with an adapted multi-task peer-prediction mechanism that rewards informative agreement, and (iv) filters unreliable sources before re-summarizing. We establish formal guarantees that align a source's incentives with informative honesty, making truthful reporting the utility-maximizing strategy. Experiments show that TTS improves factual accuracy and robustness while preserving fluency, aligning exposure with informative corroboration and disincentivizing manipulation.

7.9LGOct 16, 2020

Learning Robust Algorithms for Online Allocation Problems Using Adversarial Training

Goran Zuzic, Di Wang, Aranyak Mehta et al.

We address the challenge of finding algorithms for online allocation (i.e. bipartite matching) using a machine learning approach. In this paper, we focus on the AdWords problem, which is a classical online budgeted matching problem of both theoretical and practical significance. In contrast to existing work, our goal is to accomplish algorithm design {\em tabula rasa}, i.e., without any human-provided insights or expert-tuned training data beyond specifying the objective and constraints of the optimization problem. We construct a framework based on insights and ideas from game theory, adversarial training and GANs Key to our approach is to generate adversarial examples that expose the weakness of any given algorithm. A unique challenge in our context is to generate complete examples from scratch rather than perturbing given examples and we demonstrate this can be accomplished for the Adwords problem. We use this framework to co-train an algorithm network and an adversarial network against each other until they converge to an equilibrium. This approach finds algorithms and adversarial examples that are consistent with known optimal results. Secondly, we address the question of robustness of the algorithm, namely can we design algorithms that are both strong under practical distributions, as well as exhibit robust performance against adversarial instances. To accomplish this, we train algorithm networks using a mixture of adversarial and practical distributions like power-laws; the resulting networks exhibit a smooth trade-off between the two input regimes.