MLAug 22, 2018
Genie: An Open Box Counterfactual Policy Estimator for Optimizing Sponsored Search MarketplaceMurat Ali Bayir, Mingsen Xu, Yaojia Zhu et al.
In this paper, we propose an offline counterfactual policy estimation framework called Genie to optimize Sponsored Search Marketplace. Genie employs an open box simulation engine with click calibration model to compute the KPI impact of any modification to the system. From the experimental results on Bing traffic, we showed that Genie performs better than existing observational approaches that employs randomized experiments for traffic slices that have frequent policy updates. We also show that Genie can be used to tune completely new policies efficiently without creating risky randomized experiments due to cold start problem. As time of today, Genie hosts more than 10000 optimization jobs yearly which runs more than 30 Million processing node hours of big data jobs for Bing Ads. For the last 3 years, Genie has been proven to be the one of the major platforms to optimize Bing Ads Marketplace due to its reliability under frequent policy changes and its efficiency to minimize risks in real experiments.
LGMar 28, 2013
Scalable Text and Link Analysis with Mixed-Topic Link ModelsYaojia Zhu, Xiaoran Yan, Lise Getoor et al.
Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as well as hyperlinks or citations to other nodes. In order to perform inference on such data sets, and make predictions and recommendations, it is useful to have models that are able to capture the processes which generate the text at each node and the links between them. In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. The resulting model has the advantage that its parameters, including the mixture of topics of each document and the resulting overlapping communities, can be inferred with a simple and scalable expectation-maximization algorithm. We test our model on three data sets, performing unsupervised topic classification and link prediction. For both tasks, our model outperforms several existing state-of-the-art methods, achieving higher accuracy with significantly less computation, analyzing a data set with 1.3 million words and 44 thousand links in a few minutes.
SIJul 17, 2012
Model Selection for Degree-corrected Block ModelsXiaoran Yan, Cosma Rohilla Shalizi, Jacob E. Jensen et al.
The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key network-analysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for doing this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degree-corrected block models add a parameter for each node, modulating its over-all degree. The choice between ordinary and degree-corrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop linear-time approximations for log-likelihoods under both the stochastic block model and the degree-corrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction, and point to a general approach to model selection in network analysis.
SIMay 31, 2012
Oriented and Degree-generated Block Models: Generating and Inferring Communities with Inhomogeneous Degree DistributionsYaojia Zhu, Xiaoran Yan, Cristopher Moore
The stochastic block model is a powerful tool for inferring community structure from network topology. However, it predicts a Poisson degree distribution within each community, while most real-world networks have a heavy-tailed degree distribution. The degree-corrected block model can accommodate arbitrary degree distributions within communities. But since it takes the vertex degrees as parameters rather than generating them, it cannot use them to help it classify the vertices, and its natural generalization to directed graphs cannot even use the orientations of the edges. In this paper, we present variants of the block model with the best of both worlds: they can use vertex degrees and edge orientations in the classification process, while tolerating heavy-tailed degree distributions within communities. We show that for some networks, including synthetic networks and networks of word adjacencies in English text, these new block models achieve a higher accuracy than either standard or degree-corrected block models.