A Bayesian Model for Online Activity Sample Sizes
This work addresses sample size prediction for online activities like A/B testing, but it is incremental as it builds on existing Bayesian approaches for heterogeneous populations.
The paper tackles the problem of predicting the number of individuals who will initiate an activity over time, such as users installing software updates, by addressing heterogeneity in participation rates. It presents a Bayesian method that predicts additional participants in subsequent periods based on initial observations, with performance illustrated in online experimentation.
In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation.