Lukas Vermeer

HC
4papers
42citations
Novelty16%
AI Score31

4 Papers

69.3MEApr 6
Power Analysis is Essential: High-Powered Tests Suggest Minimal to No Effect of Rounded Shapes on Click-Through Rates

Ron Kohavi, Jakub Linowski, Lukas Vermeer et al.

Underpowered studies (below 50% power) suffer from the winner's curse: A statistically significant positive estimate must exaggerate the true treatment effect to meet the significance threshold. A study by Dipayan Biswas, Annika Abell, and Roger Chacko published in the Journal of Consumer Research (2023) reported that in an A/B test, simply rounding the corners of square buttons increased the online click-through rate by 55% (p-value 0.037)$\unicode{x2014}$a striking finding with potentially wide-ranging implications for a digital industry that is seeking to enhance consumer engagement. Drawing on our experience with tens of thousands of A/B tests, many involving similar user interface modifications, we found this dramatic claim implausibly large. To evaluate the claim and provide a more accurate estimate of the treatment effect, we conducted three high-powered A/B tests, each involving over two thousand times more users than the original study. All three experiments yielded effect size estimates that were approximately two orders of magnitude smaller than initially reported, with 95% confidence intervals that include zero (i.e., not statistically significant at the 0.05 level). Two additional independent replications by Evidoo found similarly small effects. These findings underscore the critical importance of power analysis and experimental design in increasing trust and reproducibility of results.

HCOct 29, 2018
Mediation Analysis in Online Experiments at Booking.com: Disentangling Direct and Indirect Effects

Bahattin Tolga Öztan, Zoé van Havre, Caio Gomes et al.

Online experimentation is at the core of Booking.com's customer-centric product development. While randomised controlled trials are a powerful tool for estimating the overall effects of product changes on business metrics, they often fall short in explaining the mechanism of change. This becomes problematic when decision-making depends on being able to distinguish between the direct effect of a treatment on some outcome variable and its indirect effect via a mediator variable. In this paper, we demonstrate the need for mediation analyses in online experimentation, and use simulated data to show how these methods help identify and estimate direct causal effect. Failing to take into account all confounders can lead to biased estimates, so we include sensitivity analyses to help gauge the robustness of estimates to missing causal factors.

HCOct 23, 2017
Democratizing online controlled experiments at Booking.com

Raphael Lopez Kaufman, Jegar Pitchforth, Lukas Vermeer

There is an extensive literature about online controlled experiments, both on the statistical methods available to analyze experiment results as well as on the infrastructure built by several large scale Internet companies but also on the organizational challenges of embracing online experiments to inform product development. At Booking.com we have been conducting evidenced based product development using online experiments for more than ten years. Our methods and infrastructure were designed from their inception to reflect Booking.com culture, that is, with democratization and decentralization of experimentation and decision making in mind. In this paper we explain how building a central repository of successes and failures to allow for knowledge sharing, having a generic and extensible code library which enforces a loose coupling between experimentation and business logic, monitoring closely and transparently the quality and the reliability of the data gathering pipelines to build trust in the experimentation infrastructure, and putting in place safeguards to enable anyone to have end to end ownership of their experiments have allowed such a large organization as Booking.com to truly and successfully democratize experimentation.

HCOct 1, 2017
Leaky Abstraction In Online Experimentation Platforms: A Conceptual Framework To Categorize Common Challenges

Timo Kluck, Lukas Vermeer

Online experimentation platforms abstract away many of the details of experimental design, ensuring experimenters do not have to worry about sampling, randomisation, subject tracking, data collection, metric definition and interpretation of results. The recent success and rapid adoption of these platforms in the industry might in part be attributed to the ease-of-use these abstractions provide. Previous authors have pointed out there are common pitfalls to avoid when running controlled experiments on the web and emphasised the need for experts familiar with the entire software stack to be involved in the process. In this paper, we argue that these pitfalls and the need to understand the underlying complexity are not the result of shortcomings specific to existing platforms which might be solved by better platform design. We postulate that they are a direct consequence of what is commonly referred to as "the law of leaky abstractions". That is, it is an inherent feature of any software platform that details of its implementation leak to the surface, and that in certain situations, the platform's consumers necessarily need to understand details of underlying systems in order to make proficient use of it. We present several examples of this concept, including examples from literature, and suggest some possible mitigation strategies that can be employed to reduce the impact of abstraction leakage. The conceptual framework put forward in this paper allows us to explicitly categorize experimentation pitfalls in terms of which specific abstraction is leaking, thereby aiding implementers and users of these platforms to better understand and tackle the challenges they face.