A/B Testing Measurement Framework for Recommendation Models Based on Expected Revenue
This work addresses the problem of more efficient and objective A/B testing for revenue optimization in e-commerce recommendation systems, representing an incremental improvement over existing methods.
The paper tackles the problem of measuring revenue per visit (RPV) improvements in recommendation systems by splitting RPV into conversion rate and average order value, using statistical tests including a two-part test and alternatives based on log-normal assumptions. The result is a method that reduces sample size needs by an unspecified amount, eliminates subjective outlier removal, and provides confidence intervals for RPV uncertainty, validated empirically with Staples.com data.
We provide a method to determine whether a new recommendation system improves the revenue per visit (RPV) compared to the status quo. We achieve our goal by splitting RPV into conversion rate and average order value (AOV). We use the two-part test suggested by Lachenbruch to determine if the data generating process in the new system is different. In cases that this test does not give us a definitive answer about the change in RPV, we propose two alternative tests to determine if RPV has changed. Both of these tests rely on the assumption that non-zero purchase values follow a log-normal distribution. We empirically validate this assumption using data collected at different points in time from Staples.com. On average, our method needs a smaller sample size than other methods. Furthermore, it does not require any subjective outlier removal. Finally, it characterizes the uncertainty around RPV by providing a confidence interval.