Drew Dimmery

LG
5papers
137citations
Novelty50%
AI Score25

5 Papers

LGNov 5, 2021
Interpretable Personalized Experimentation

Han Wu, Sarah Tan, Weiwei Li et al.

Black-box heterogeneous treatment effect (HTE) models are increasingly being used to create personalized policies that assign individuals to their optimal treatments. However, they are difficult to understand, and can be burdensome to maintain in a production environment. In this paper, we present a scalable, interpretable personalized experimentation system, implemented and deployed in production at Meta. The system works in a multiple treatment, multiple outcome setting typical at Meta to: (1) learn explanations for black-box HTE models; (2) generate interpretable personalized policies. We evaluate the methods used in the system on publicly available data and Meta use cases, and discuss lessons learnt during the development of the system.

MEOct 21, 2020
Efficient Balanced Treatment Assignments for Experimentation

David Arbour, Drew Dimmery, Anup Rao

In this work, we reframe the problem of balanced treatment assignment as optimization of a two-sample test between test and control units. Using this lens we provide an assignment algorithm that is optimal with respect to the minimum spanning tree test of Friedman and Rafsky (1979). This assignment to treatment groups may be performed exactly in polynomial time. We provide a probabilistic interpretation of this process in terms of the most probable element of designs drawn from a determinantal point process which admits a probabilistic interpretation of the design. We provide a novel formulation of estimation as transductive inference and show how the tree structures used in design can also be used in an adjustment estimator. We conclude with a simulation study demonstrating the improved efficacy of our method.

NIAug 28, 2020
Real-world Video Adaptation with Reinforcement Learning

Hongzi Mao, Shannon Chen, Drew Dimmery et al.

Client-side video players employ adaptive bitrate (ABR) algorithms to optimize user quality of experience (QoE). We evaluate recently proposed RL-based ABR methods in Facebook's web-based video streaming platform. Real-world ABR contains several challenges that requires customized designs beyond off-the-shelf RL algorithms -- we implement a scalable neural network architecture that supports videos with arbitrary bitrate encodings; we design a training method to cope with the variance resulting from the stochasticity in network conditions; and we leverage constrained Bayesian optimization for reward shaping in order to optimize the conflicting QoE objectives. In a week-long worldwide deployment with more than 30 million video streaming sessions, our RL approach outperforms the existing human-engineered ABR algorithms.

LGNov 2, 2019
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints

Samuel Daulton, Shaun Singh, Vashist Avadhanula et al.

Recent advances in contextual bandit optimization and reinforcement learning have garnered interest in applying these methods to real-world sequential decision making problems. Real-world applications frequently have constraints with respect to a currently deployed policy. Many of the existing constraint-aware algorithms consider problems with a single objective (the reward) and a constraint on the reward with respect to a baseline policy. However, many important applications involve multiple competing objectives and auxiliary constraints. In this paper, we propose a novel Thompson sampling algorithm for multi-outcome contextual bandit problems with auxiliary constraints. We empirically evaluate our algorithm on a synthetic problem. Lastly, we apply our method to a real world video transcoding problem and provide a practical way for navigating the trade-off between safety and performance using Bayesian optimization.

LGJun 9, 2019
Balanced off-policy evaluation in general action spaces

Arjun Sondhi, David Arbour, Drew Dimmery

Estimation of importance sampling weights for off-policy evaluation of contextual bandits often results in imbalance - a mismatch between the desired and the actual distribution of state-action pairs after weighting. In this work we present balanced off-policy evaluation (B-OPE), a generic method for estimating weights which minimize this imbalance. Estimation of these weights reduces to a binary classification problem regardless of action type. We show that minimizing the risk of the classifier implies minimization of imbalance to the desired counterfactual distribution of state-action pairs. The classifier loss is tied to the error of the off-policy estimate, allowing for easy tuning of hyperparameters. We provide experimental evidence that B-OPE improves weighting-based approaches for offline policy evaluation in both discrete and continuous action spaces.