Deeksha Sinha

AI
h-index16
8papers
53citations
Novelty57%
AI Score42

8 Papers

LGNov 11, 2022
Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms

Vashist Avadhanula, Omar Abdul Baki, Hamsa Bastani et al.

We describe the current content moderation strategy employed by Meta to remove policy-violating content from its platforms. Meta relies on both handcrafted and learned risk models to flag potentially violating content for human review. Our approach aggregates these risk models into a single ranking score, calibrating them to prioritize more reliable risk models. A key challenge is that violation trends change over time, affecting which risk models are most reliable. Our system additionally handles production challenges such as changing risk models and novel risk models. We use a contextual bandit to update the calibration in response to such trends. Our approach increases Meta's top-line metric for measuring the effectiveness of its content moderation strategy by 13%.

MEMay 7
Estimate Level Adjustment For Inference With Proxies Under Random Distribution Shifts

Steven Wilkins-Reeves, Alexandra N. M. Darmon, Deeksha Sinha

In many scientific domains, including experimentation, researchers rely on measurements of proxy outcomes to achieve faster and more frequent reads, especially when the primary outcome of interest is challenging to measure directly. While proxies offer a more readily accessible observation for inference, the ultimate goal is to draw statistical inferences about the primary outcome parameter and proxy data are typically imperfect in some ways. To correct for these imperfections, current statistical inference methods often depend on strict identifying assumptions (such as surrogacy, covariate/label shift, or missingness assumptions). These assumptions can be difficult to validate and may be violated by various additional sources of distribution shift, potentially leading to biased parameter estimates and miscalibrated uncertainty quantification. We introduce an estimate-level framework, inspired by domain adaptation techniques, to empirically calibrate proxy-based inference. This framework models the proxy-primary metric discrepancy as a random effect at the parameter level, estimating its distribution from aggregated historical observations across past domains (e.g., experiments, time periods, or distinct segments). This method avoids the requirement for retaining individual-level response data. Additionally, this adjustment can be layered on top of existing proxy-correction methods (such as prediction-powered inference or importance weighting) to account for additional biases not addressed by those corrections. To manage uncertainty when the number of historical domains is limited, we provide both a method-of-moments estimator and a domain bootstrap procedure. We further validate this approach using publicly available datasets and real-world experiments.

DSMay 27, 2025
Scheduling with Uncertain Holding Costs and its Application to Content Moderation

Caner Gocmen, Thodoris Lykouris, Deeksha Sinha et al.

In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown. Motivated by such uncertain holding costs, we consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs. We demonstrate that in the presence of such uncertain holding costs, the two canonical algorithmic principles, instantaneous-cost ($cμ$-rule) and expected-remaining-cost ($cμ/θ$-rule), are suboptimal. By viewing each job as a Markovian ski-rental problem, we develop a new index-based algorithm, Opportunity-adjusted Remaining Cost (OaRC), that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves. We show that the regret of OaRC scales as $\tilde{O}(L^{1.5}\sqrt{N})$, where $L$ is the maximum length of a job's holding cost trajectory and $N$ is the system size. This regret bound shows that OaRC achieves asymptotic optimality when the system size $N$ scales to infinity. Moreover, its regret is independent of the state-space size, which is a desirable property when job states contain contextual information. We corroborate our results with an extensive simulation study based on two holding cost patterns (online ads and user-generated content) that arise in content moderation for social media platforms. Our simulations based on synthetic and real datasets demonstrate that OaRC consistently outperforms existing practice, which is based on the two canonical algorithmic principles.

AINov 17, 2020
Optimizing Offer Sets in Sub-Linear Time

Vivek F. Farias, Andrew A. Li, Deeksha Sinha

Personalization and recommendations are now accepted as core competencies in just about every online setting, ranging from media platforms to e-commerce to social networks. While the challenge of estimating user preferences has garnered significant attention, the operational problem of using such preferences to construct personalized offer sets to users is still a challenge, particularly in modern settings where a massive number of items and a millisecond response time requirement mean that even enumerating all of the items is impossible. Faced with such settings, existing techniques are either (a) entirely heuristic with no principled justification, or (b) theoretically sound, but simply too slow to work. Thus motivated, we propose an algorithm for personalized offer set optimization that runs in time sub-linear in the number of items while enjoying a uniform performance guarantee. Our algorithm works for an extremely general class of problems and models of user choice that includes the mixed multinomial logit model as a special case. We achieve a sub-linear runtime by leveraging the dimensionality reduction from learning an accurate latent factor model, along with existing sub-linear time approximate near neighbor algorithms. Our algorithm can be entirely data-driven, relying on samples of the user, where a `sample' refers to the user interaction data typically collected by firms. We evaluate our approach on a massive content discovery dataset from Outbrain that includes millions of advertisements. Results show that our implementation indeed runs fast and with increased performance relative to existing fast heuristics.

LGNov 3, 2020
Multi-armed Bandits with Cost Subsidy

Deeksha Sinha, Karthik Abinav Sankararama, Abbas Kazerouni et al.

In this paper, we consider a novel variant of the multi-armed bandit (MAB) problem, MAB with cost subsidy, which models many real-life applications where the learning agent has to pay to select an arm and is concerned about optimizing cumulative costs and rewards. We present two applications, intelligent SMS routing problem and ad audience optimization problem faced by several businesses (especially online platforms), and show how our problem uniquely captures key features of these applications. We show that naive generalizations of existing MAB algorithms like Upper Confidence Bound and Thompson Sampling do not perform well for this problem. We then establish a fundamental lower bound on the performance of any online learning algorithm for this problem, highlighting the hardness of our problem in comparison to the classical MAB problem. We also present a simple variant of explore-then-commit and establish near-optimal regret bounds for this algorithm. Lastly, we perform extensive numerical simulations to understand the behavior of a suite of algorithms for various instances and recommend a practical guide to employ different algorithms.

IRJun 14, 2020
Multi-Purchase Behavior: Modeling, Estimation and Optimization

Theja Tulabandhula, Deeksha Sinha, Saketh Reddy Karra et al.

We study the problem of modeling purchase of multiple products and utilizing it to display optimized recommendations for online retailers and e-commerce platforms. We present a parsimonious multi-purchase family of choice models called the Bundle-MVL-K family, and develop a binary search based iterative strategy that efficiently computes optimized recommendations for this model. We establish the hardness of computing optimal recommendation sets, and derive several structural properties of the optimal solution that aid in speeding up computation. This is one of the first attempts at operationalizing multi-purchase class of choice models. We show one of the first quantitative links between modeling multiple purchase behavior and revenue gains. The efficacy of our modeling and optimization techniques compared to competing solutions is shown using several real world datasets on multiple metrics such as model fitness, expected revenue gains and run-time reductions. For example, the expected revenue benefit of taking multiple purchases into account is observed to be $\sim5\%$ in relative terms for the Ta Feng and UCI shopping datasets, when compared to the MNL model for instances with $\sim 1500$ products. Additionally, across $6$ real world datasets, the test log-likelihood fits of our models are on average $17\%$ better in relative terms. Our work contributes to the study multi-purchase decisions, analyzing consumer demand and the retailers optimization problem. The simplicity of our models and the iterative nature of our optimization technique allows practitioners meet stringent computational constraints while increasing their revenues in practical recommendation applications at scale, especially in e-commerce platforms and other marketplaces.

MEJun 11, 2020
The Limits to Learning a Diffusion Model

Jackie Baek, Vivek F. Farias, Andreea Georgescu et al.

This paper provides the first sample complexity lower bounds for the estimation of simple diffusion models, including the Bass model (used in modeling consumer adoption) and the SIR model (used in modeling epidemics). We show that one cannot hope to learn such models until quite late in the diffusion. Specifically, we show that the time required to collect a number of observations that exceeds our sample complexity lower bounds is large. For Bass models with low innovation rates, our results imply that one cannot hope to predict the eventual number of adopting customers until one is at least two-thirds of the way to the time at which the rate of new adopters is at its peak. In a similar vein, our results imply that in the case of an SIR model, one cannot hope to predict the eventual number of infections until one is approximately two-thirds of the way to the time at which the infection rate has peaked. This lower bound in estimation further translates into a lower bound in regret for decision-making in epidemic interventions. Our results formalize the challenge of accurate forecasting and highlight the importance of incorporating additional data sources. To this end, we analyze the benefit of a seroprevalence study in an epidemic, where we characterize the size of the study needed to improve SIR model estimation. Extensive empirical analyses on product adoption and epidemic data support our theoretical findings.

AIMar 6, 2020
Optimizing Revenue while showing Relevant Assortments at Scale

Theja Tulabandhula, Deeksha Sinha, Saketh Karra

Scalable real-time assortment optimization has become essential in e-commerce operations due to the need for personalization and the availability of a large variety of items. While this can be done when there are simplistic assortment choices to be made, the optimization process becomes difficult when imposing constraints on the collection of relevant assortments based on insights by store-managers and historically well-performing assortments. We design fast and flexible algorithms based on variations of binary search that find the (approximately) optimal assortment in this difficult regime. In particular, we revisit the problem of large-scale assortment optimization under the multinomial logit choice model without any assumptions on the structure of the feasible assortments. We speed up the comparison steps using advances in similarity search in the field of information retrieval/machine learning. For an arbitrary collection of assortments, our algorithms can find a solution in time that is sub-linear in the number of assortments, and for the simpler case of cardinality constraints - linear in the number of items (existing methods are quadratic or worse). Empirical validations using a real world dataset (in addition to experiments using semi-synthetic data based on the Billion Prices dataset and several retail transaction datasets) show that our algorithms are competitive even when the number of items is $\sim 10^5$ ($10\times$ larger instances than previously studied).