Pranjal Rawat

h-index3

4papers

3citations

Novelty35%

AI Score35

Ranked #122,915 of 201,326 authors (top 61%)#12 in GN (top 36%)

4 Papers

39.6GNMar 20

Designing Auctions when Algorithms Learn to Bid

Pranjal Rawat

Algorithms increasingly automate bidding in online auctions, raising concerns about tacit bid suppression and revenue shortfalls. Prior work identifies individual mechanisms behind algorithmic bid suppression, but it remains unclear which factors matter most and how they interact, and policy conclusions rest on algorithms unlike those deployed in practice. This paper develops a computational laboratory framework, based on factorial experimental designs and large-scale Monte Carlo simulation, that addresses bid suppression across multiple algorithm classes within a common methodology. Each simulation is treated as a black-box input-output observation; the framework varies inputs and ranks factors by association with outcomes, without explaining algorithms' internal mechanisms. Across six sub-experiments spanning Q-learning, contextual bandits, and budget-constrained pacing, the framework ranks the relative importance of auction format, competitive pressure, learning parameters, and budget constraints on seller revenue. The central finding is that structural market parameters dominate algorithmic design choices. In unconstrained settings, competitive pressure is the strongest predictor of revenue; under budget constraints, budget tightness takes over. The auction-format effect is context-dependent, favouring second-price under learning algorithms but reversing to favour first-price under budget-constrained pacing. Because the optimal format depends on the prevailing bidding technology, no single auction format is universally superior when bidders are algorithms, and applying format recommendations from one algorithm class to another leads to counterproductive design interventions.

27.1GNMar 15

A Survey of Reinforcement Learning For Economics

Pranjal Rawat

This survey (re)introduces reinforcement learning methods to economists. The curse of dimensionality limits how far exact dynamic programming can be effectively applied, forcing us to rely on suitably "small" problems or our ability to convert "big" problems into smaller ones. While this reduction has been sufficient for many classical applications, a growing class of economic models resists such reduction. Reinforcement learning algorithms offer a natural, sample-based extension of dynamic programming, extending tractability to problems with high-dimensional states, continuous actions, and strategic interactions. I review the theory connecting classical planning to modern learning algorithms and demonstrate their mechanics through simulated examples in pricing, inventory control, strategic games, and preference elicitation. I also examine the practical vulnerabilities of these algorithms, noting their brittleness, sample inefficiency, sensitivity to hyperparameters, and the absence of global convergence guarantees outside of tabular settings. The successes of reinforcement learning remain strictly bounded by these constraints, as well as a reliance on accurate simulators. When guided by economic structure, reinforcement learning provides a remarkably flexible framework. It stands as an imperfect, but promising, addition to the computational economist's toolkit. A companion survey (Rust and Rawat, 2026b) covers the inverse problem of inferring preferences from observed behavior. All simulation code is publicly available.

GNApr 14, 2025

Who is More Bayesian: Humans or ChatGPT?

Tianshi Mu, Pranjal Rawat, John Rust et al.

We compare the performance of human and artificially intelligent (AI) decision makers in simple binary classification tasks where the optimal decision rule is given by Bayes Rule. We reanalyze choices of human subjects gathered from laboratory experiments conducted by El-Gamal and Grether and Holt and Smith. We confirm that while overall, Bayes Rule represents the single best model for predicting human choices, subjects are heterogeneous and a significant share of them make suboptimal choices that reflect judgement biases described by Kahneman and Tversky that include the ``representativeness heuristic'' (excessive weight on the evidence from the sample relative to the prior) and ``conservatism'' (excessive weight on the prior relative to the sample). We compare the performance of AI subjects gathered from recent versions of large language models (LLMs) including several versions of ChatGPT. These general-purpose generative AI chatbots are not specifically trained to do well in narrow decision making tasks, but are trained instead as ``language predictors'' using a large corpus of textual data from the web. We show that ChatGPT is also subject to biases that result in suboptimal decisions. However we document a rapid evolution in the performance of ChatGPT from sub-human performance for early versions (ChatGPT 3.5) to superhuman and nearly perfect Bayesian classifications in the latest versions (ChatGPT 4o).

GNOct 17, 2024

Approximating Auction Equilibria with Reinforcement Learning

Pranjal Rawat

Traditional methods for computing equilibria in auctions become computationally intractable as auction complexity increases, particularly in multi-item and dynamic auctions. This paper introduces a self-play based reinforcement learning approach that employs advanced algorithms such as Proximal Policy Optimization and Neural Fictitious Self-Play to approximate Bayes-Nash equilibria. This framework allows for continuous action spaces, high-dimensional information states, and delayed payoffs. Through self-play, these algorithms can learn robust and near-optimal bidding strategies in auctions with known equilibria, including those with symmetric and asymmetric valuations, private and interdependent values, and multi-round auctions.