Boshuang Huang

2papers

2 Papers

LGOct 14, 2021

Looper: An end-to-end ML platform for product decisions

Igor L. Markov, Hanson Wang, Nitya Kasturi et al.

Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support finegrain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior platforms, we introduce general principles for and the architecture of an ML platform, Looper, with simple APIs for decision-making and feedback collection. Looper covers the end-to-end ML lifecycle from collecting training data and model training to deployment and inference, and extends support to personalization, causal evaluation with heterogenous treatment effects, and Bayesian tuning for product goals. During the 2021 production deployment Looper simultaneously hosted 440-1,000 ML models that made 4-6 million real-time decisions per second. We sum up experiences of platform adopters and describe their learning curve.

LGApr 19, 2019

Disagreement-based Active Learning in Online Settings

Boshuang Huang, Sudeep Salgia, Qing Zhao

We study online active learning for classifying streaming instances within the framework of statistical learning theory. At each time, the learner either queries the label of the current instance or predicts the label based on past seen examples. The objective is to minimize the number of queries while constraining the number of prediction errors over a horizon of length $T$. We develop a disagreement-based online learning algorithm for a general hypothesis space and under the Tsybakov noise. We show that the proposed algorithm has a label complexity of $O(dT^{\frac{2-2α}{2-α}}\log^2 T)$ under a constraint of bounded regret in terms of classification errors, where $d$ is the VC dimension of the hypothesis space and $α$ is the Tsybakov noise parameter. We further establish a matching (up to a poly-logarithmic factor) lower bound, demonstrating the order optimality of the proposed algorithm. We address the tradeoff between label complexity and regret and show that the algorithm can be modified to operate at a different point on the tradeoff curve.