Automated metrics calculation in a dynamic heterogeneous environment
This addresses the problem of scaling experimentation for various teams at Microsoft, but it is incremental as it builds on existing industry practices.
The paper tackles the challenge of providing a single platform for software experimentation in a highly heterogeneous and evolving ecosystem at Microsoft, resulting in a system that supports diverse products, compute fabrics, and user personas to democratize metric definition and analysis.
A consistent theme in software experimentation at Microsoft has been solving problems of experimentation at scale for a diverse set of products. Running experiments at scale (i.e., many experiments on many users) has become state of the art across the industry. However, providing a single platform that allows software experimentation in a highly heterogenous and constantly evolving ecosystem remains a challenge. In our case, heterogeneity spans multiple dimensions. First, we need to support experimentation for many types of products: websites, search engines, mobile apps, operating systems, cloud services and others. Second, due to the diversity of the products and teams using our platform, it needs to be flexible enough to analyze data in multiple compute fabrics (e.g. Spark, Azure Data Explorer), with a way to easily add support for new fabrics if needed. Third, one of the main factors in facilitating growth of experimentation culture in an organization is to democratize metric definition and analysis processes. To achieve that, our system needs to be simple enough to be used not only by data scientists, but also engineers, product managers and sales teams. Finally, different personas might need to use the platform for different types of analyses, e.g. dashboards or experiment analysis, and the platform should be flexible enough to accommodate that. This paper presents our solution to the problems of heterogeneity listed above.