GTFeb 25, 2022
Bidding Agent Design in the LinkedIn Ad MarketplaceYuan Gao, Kaiyu Yang, Yuanlong Chen et al.
We establish a general optimization framework for the design of automated bidding agent in dynamic online marketplaces. It optimizes solely for the buyer's interest and is agnostic to the auction mechanism imposed by the seller. As a result, the framework allows, for instance, the joint optimization of a group of ads across multiple platforms each running its own auction format. Bidding strategy derived from this framework automatically guarantees the optimality of budget allocation across ad units and platforms. Common constraints such as budget delivery schedule, return on investments and guaranteed results, directly translates to additional parameters in the bidding formula. We share practical learnings of the deployed bidding system in the LinkedIn ad marketplace based on this framework.
AIJun 23, 2020
A Framework for Fairness in Two-Sided MarketplacesKinjal Basu, Cyrus DiCiccio, Heloise Logan et al.
Many interesting problems in the Internet industry can be framed as a two-sided marketplace problem. Examples include search applications and recommender systems showing people, jobs, movies, products, restaurants, etc. Incorporating fairness while building such systems is crucial and can have a deep social and economic impact (applications include job recommendations, recruiters searching for candidates, etc.). In this paper, we propose a definition and develop an end-to-end framework for achieving fairness while building such machine learning systems at scale. We extend prior work to develop an optimization framework that can tackle fairness constraints from both the source and destination sides of the marketplace, as well as dynamic aspects of the problem. The framework is flexible enough to adapt to different definitions of fairness and can be implemented in very large-scale settings. We perform simulations to show the efficacy of our approach.
MLJun 19, 2020
Achieving Fairness via Post-Processing in Web-Scale Recommender SystemsPreetam Nandy, Cyrus Diciccio, Divya Venugopalan et al.
Building fair recommender systems is a challenging and crucial area of study due to its immense impact on society. We extended the definitions of two commonly accepted notions of fairness to recommender systems, namely equality of opportunity and equalized odds. These fairness measures ensure that equally "qualified" (or "unqualified") candidates are treated equally regardless of their protected attribute status (such as gender or race). We propose scalable methods for achieving equality of opportunity and equalized odds in rankings in the presence of position bias, which commonly plagues data generated from recommender systems. Our algorithms are model agnostic in the sense that they depend only on the final scores provided by a model, making them easily applicable to virtually all web-scale recommender systems. We conduct extensive simulations as well as real-world experiments to show the efficacy of our approach.
MEAug 2, 2016
Can we trust the bootstrap in high-dimension?Noureddine El Karoui, Elizabeth Purdom
We consider the performance of the bootstrap in high-dimensions for the setting of linear regression, where $p<n$ but $p/n$ is not close to zero. We consider ordinary least-squares as well as robust regression methods and adopt a minimalist performance requirement: can the bootstrap give us good confidence intervals for a single coordinate of $β$? (where $β$ is the true regression vector). We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. Both of the most commonly used methods of bootstrapping for regression -- residual bootstrap and pairs bootstrap -- give very poor inference on $β$ as the ratio $p/n$ grows. We find that the residuals bootstrap tend to give anti-conservative estimates (inflated Type I error), while the pairs bootstrap gives very conservative estimates (severe loss of power) as the ratio $p/n$ grows. We also show that the jackknife resampling technique for estimating the variance of $\hatβ$ severely overestimates the variance in high dimensions. We contribute alternative bootstrap procedures based on our theoretical results that mitigate these problems. However, the corrections depend on assumptions regarding the underlying data-generation model, suggesting that in high-dimensions it may be difficult to have universal, robust bootstrapping techniques.
STMay 23, 2014
Connection graph Laplacian methods can be made robust to noiseNoureddine El Karoui, Hau-tieng Wu
Recently, several data analytic techniques based on connection graph laplacian (CGL) ideas have appeared in the literature. At this point, the properties of these methods are starting to be understood in the setting where the data is observed without noise. We study the impact of additive noise on these methods, and show that they are remarkably robust. As a by-product of our analysis, we propose modifications of the standard algorithms that increase their robustness to noise. We illustrate our results in numerical simulations.
PROct 1, 2013
Graph connection Laplacian and random matrices with random blocksNoureddine El Karoui, Hau-tieng Wu
Graph connection Laplacian (GCL) is a modern data analysis technique that is starting to be applied for the analysis of high dimensional and massive datasets. Motivated by this technique, we study matrices that are akin to the ones appearing in the null case of GCL, i.e the case where there is no structure in the dataset under investigation. Developing this understanding is important in making sense of the output of the algorithms based on GCL. We hence develop a theory explaining the behavior of the spectral distribution of a large class of random matrices, in particular random matrices with random block entries of fixed size. Part of the theory covers the case where there is significant dependence between the blocks. Numerical work shows that the agreement between our theoretical predictions and numerical simulations is generally very good.
NAFeb 5, 2010
Second order accurate distributed eigenvector computation for extremely large matricesNoureddine El Karoui, Alexandre d'Aspremont
We propose a second-order accurate method to estimate the eigenvectors of extremely large matrices thereby addressing a problem of relevance to statisticians working in the analysis of very large datasets. More specifically, we show that averaging eigenvectors of randomly subsampled matrices efficiently approximates the true eigenvectors of the original matrix under certain conditions on the incoherence of the spectral decomposition. This incoherence assumption is typically milder than those made in matrix completion and allows eigenvectors to be sparse. We discuss applications to spectral methods in dimensionality reduction and information retrieval.