Nicolás Della Penna

CY
5papers
107citations
Novelty57%
AI Score41

5 Papers

MLMar 10
What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

Nicolás Della Penna

Bandits with noncompliance separate the learner's recommendation from the treatment actually delivered, so the learning target itself must be chosen. A platform may care about recommendation welfare in the current mediated workflow, treatment learning for a future direct-control regime, or anytime-valid uncertainty for one of those targets. These objectives need not agree. We formalize this objective-choice problem, identify the direct-control regime in which recommendation and treatment objectives collapse, and show by example that recommendation welfare can strictly exceed every learner-measurable treatment policy when downstream actors use private information. For finite-context square-IV problems we propose BRACE, a parameter-free phase-doubling algorithm that performs IV inversion only after matrix certification and otherwise returns full-range but honest structural intervals. BRACE delivers simultaneous policy-value validity, fixed-gap identification of the operationally optimal recommendation policy, and fixed-gap identification of the structurally optimal treatment policy under contextual homogeneity and invertibility. We complement the theory with a finite-context empirical benchmark spanning direct control, mediated present-versus-future tradeoffs, weak identification, homogeneity failure, and rectangular overidentification. The experiments show that safety appears as regret on easy problems, as abstention and wide valid intervals under weak identification, as a reason to prefer recommendation welfare under homogeneity failure, and as tighter structural uncertainty when extra instruments are available. For rich contexts, we also derive an orthogonal score whose conditional bias factorizes into compliance-model and outcome-model errors, clarifying what must be stabilized for anytime-valid semiparametric IV inference.

GTMar 31, 2021
Towards Prior-Free Approximately Truthful One-Shot Auction Learning via Differential Privacy

Daniel Reusche, Nicolás Della Penna

Designing truthful, revenue maximizing auctions is a core problem of auction design. Multi-item settings have long been elusive. Recent work (arXiv:1706.03459) introduces effective deep learning techniques to find such auctions for the prior-dependent setting, in which distributions about bidder preferences are known. One remaining problem is to obtain priors in a way that excludes the possibility of manipulating the resulting auctions. Using techniques from differential privacy for the construction of approximately truthful mechanisms, we modify the RegretNet approach to be applicable to the prior-free setting. In this more general setting, no distributional information is assumed, but we trade this property for worse performance. We present preliminary empirical results and qualitative analysis for this work in progress.

CYJan 17, 2018
An Experimental Study of Cryptocurrency Market Dynamics

Peter M Krafft, Nicolás Della Penna, Alex Pentland

As cryptocurrencies gain popularity and credibility, marketplaces for cryptocurrencies are growing in importance. Understanding the dynamics of these markets can help to assess how viable the cryptocurrnency ecosystem is and how design choices affect market behavior. One existential threat to cryptocurrencies is dramatic fluctuations in traders' willingness to buy or sell. Using a novel experimental methodology, we conducted an online experiment to study how susceptible traders in these markets are to peer influence from trading behavior. We created bots that executed over one hundred thousand trades costing less than a penny each in 217 cryptocurrencies over the course of six months. We find that individual "buy" actions led to short-term increases in subsequent buy-side activity hundreds of times the size of our interventions. From a design perspective, we note that the design choices of the exchange we study may have promoted this and other peer influence effects, which highlights the potential social and economic impact of HCI in the design of digital institutions.

CYAug 5, 2016
Human collective intelligence as distributed Bayesian inference

Peter M. Krafft, Julia Zheng, Wei Pan et al.

Collective intelligence is believed to underly the remarkable success of human society. The formation of accurate shared beliefs is one of the key components of human collective intelligence. How are accurate shared beliefs formed in groups of fallible individuals? Answering this question requires a multiscale analysis. We must understand both the individual decision mechanisms people use, and the properties and dynamics of those mechanisms in the aggregate. As of yet, mathematical tools for such an approach have been lacking. To address this gap, we introduce a new analytical framework: We propose that groups arrive at accurate shared beliefs via distributed Bayesian inference. Distributed inference occurs through information processing at the individual level, and yields rational belief formation at the group level. We instantiate this framework in a new model of human social decision-making, which we validate using a dataset we collected of over 50,000 users of an online social trading platform where investors mimic each others' trades using real money in foreign exchange and other asset markets. We find that in this setting people use a decision mechanism in which popularity is treated as a prior distribution for which decisions are best to make. This mechanism is boundedly rational at the individual level, but we prove that in the aggregate implements a type of approximate "Thompson sampling"---a well-known and highly effective single-agent Bayesian machine learning algorithm for sequential decision-making. The perspective of distributed Bayesian inference therefore reveals how collective rationality emerges from the boundedly rational decision mechanisms people use.

MLFeb 9, 2016
Compliance-Aware Bandits

Nicolás Della Penna, Mark D. Reid, David Balduzzi

Motivated by clinical trials, we study bandits with observable non-compliance. At each step, the learner chooses an arm, after, instead of observing only the reward, it also observes the action that took place. We show that such noncompliance can be helpful or hurtful to the learner in general. Unfortunately, naively incorporating compliance information into bandit algorithms loses guarantees on sublinear regret. We present hybrid algorithms that maintain regret bounds up to a multiplicative factor and can incorporate compliance information. Simulations based on real data from the International Stoke Trial show the practical potential of these algorithms.