Konstantinos Stavropoulos

LG
h-index39
20papers
157citations
Novelty68%
AI Score59

20 Papers

69.0LGMar 16
The Importance of Being Smoothly Calibrated

Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar et al. · harvard

Recent work has highlighted the centrality of smooth calibration [Kakade and Foster, 2008] as a robust measure of calibration error. We generalize, unify, and extend previous results on smooth calibration, both as a robust calibration measure, and as a step towards omniprediction, which enables predictions with low regret for downstream decision makers seeking to optimize some proper loss unknown to the predictor. We present a new omniprediction guarantee for smoothly calibrated predictors, for the class of all bounded proper losses. We smooth the predictor by adding some noise to it, and compete against smoothed versions of any benchmark predictor on the space, where we add some noise to the predictor and then post-process it arbitrarily. The omniprediction error is bounded by the smooth calibration error of the predictor and the earth mover's distance from the benchmark. We exhibit instances showing that this dependence cannot, in general, be improved. We show how this unifies and extends prior results [Foster and Vohra, 1998; Hartline, Wu, and Yang, 2025] on omniprediction from smooth calibration. We present a crisp new characterization of smooth calibration in terms of the earth mover's distance to the closest perfectly calibrated joint distribution of predictions and labels. This also yields a simpler proof of the relation to the lower distance to calibration from [Blasiok, Gopalan, Hu, and Nakkiran, 2023]. We use this to show that the upper distance to calibration cannot be estimated within a quadratic factor with sample complexity independent of the support size of the predictions. This is in contrast to the distance to calibration, where the corresponding problem was known to be information-theoretically impossible: no finite number of samples suffice [Blasiok, Gopalan, Hu, and Nakkiran, 2023].

DSNov 25, 2023
Testable Learning with Distribution Shift

Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution $D$, unlabeled samples from test distribution $D'$ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between $D$ and $D'$. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from $D$ and $D'$ pass an associated test; moreover, the test must accept if the marginal of $D$ equals the marginal of $D'$. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on $\{\pm 1\}^d$. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on $D'$. For halfspaces in the realizable case (where there exists a halfspace consistent with both $D$ and $D'$), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.

LGJun 18, 2023
Agnostically Learning Single-Index Models using Omnipredictors

Aravind Gollakota, Parikshit Gopalan, Adam R. Klivans et al.

We give the first result for agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. All prior work either held only in the realizable setting or required the activation to be known. Moreover, we only require the marginal to have bounded second moments, whereas all prior work required stronger distributional assumptions (such as anticoncentration or boundedness). Our algorithm is based on recent work by [GHK$^+$23] on omniprediction using predictors satisfying calibrated multiaccuracy. Our analysis is simple and relies on the relationship between Bregman divergences (or matching losses) and $\ell_p$ distances. We also provide new guarantees for standard algorithms like GLMtron and logistic regression in the agnostic setting.

LGFeb 28, 2023
An Efficient Tester-Learner for Halfspaces

Aravind Gollakota, Adam R. Klivans, Konstantinos Stavropoulos et al.

We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan (2023). In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes an associated test, and training sets drawn from some target distribution -- e.g., the Gaussian -- must pass the test. This model is more challenging than distribution-specific agnostic or Massart noise models where the learner is allowed to fail arbitrarily if the distributional assumption does not hold. We consider the setting where the target distribution is Gaussian (or more generally any strongly log-concave distribution) in $d$ dimensions and the noise model is either Massart or adversarial (agnostic). For Massart noise, our tester-learner runs in polynomial time and outputs a hypothesis with (information-theoretically optimal) error $\mathsf{opt} + ε$ for any strongly log-concave target distribution. For adversarial noise, our tester-learner obtains error $O(\mathsf{opt}) + ε$ in polynomial time when the target distribution is Gaussian; for strongly log-concave distributions, we obtain $\tilde{O}(\mathsf{opt}) + ε$ in quasipolynomial time. Prior work on testable learning ignores the labels in the training set and checks that the empirical moments of the covariates are close to the moments of the base distribution. Here we develop new tests of independent interest that make critical use of the labels and combine them with the moment-matching approach of Gollakota et al. (2023). This enables us to simulate a variant of the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using nonconvex SGD but in the testable learning setting.

LGJul 1, 2024
Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Gautam Chandrasekaran, Adam Klivans, Vasilis Kontonis et al.

In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on $\mathbb{R}^d \times \{\pm 1\}$ -- is to output a hypothesis that is competitive (to within $ε$) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of $k$-halfspaces in time $k^{poly(\frac{\log k}{εγ}) }$ where $γ$ is the margin parameter. Before our work, the best-known runtime was exponential in $k$ (Arriaga and Vempala, 1999).

DSNov 10, 2025
A Fully Polynomial-Time Algorithm for Robustly Learning Halfspaces over the Hypercube

Gautam Chandrasekaran, Adam R. Klivans, Konstantinos Stavropoulos et al.

We give the first fully polynomial-time algorithm for learning halfspaces with respect to the uniform distribution on the hypercube in the presence of contamination, where an adversary may corrupt some fraction of examples and labels arbitrarily. We achieve an error guarantee of $η^{O(1)}+ε$ where $η$ is the noise rate. Such a result was not known even in the agnostic setting, where only labels can be adversarially corrupted. All prior work over the last two decades has a superpolynomial dependence in $1/ε$ or succeeds only with respect to continuous marginals (such as log-concave densities). Previous analyses rely heavily on various structural properties of continuous distributions such as anti-concentration. Our approach avoids these requirements and makes use of a new algorithm for learning Generalized Linear Models (GLMs) with only a polylogarithmic dependence on the activation function's Lipschitz constant. More generally, our framework shows that supervised learning with respect to discrete distributions is not as difficult as previously thought.

LGNov 9, 2025
Sparse Linear Regression is Easy on Random Supports

Gautam Chandrasekaran, Raghu Meka, Konstantinos Stavropoulos

Sparse linear regression is one of the most basic questions in machine learning and statistics. Here, we are given as input a design matrix $X \in \mathbb{R}^{N \times d}$ and measurements or labels ${y} \in \mathbb{R}^N$ where ${y} = {X} {w}^* + ξ$, and $ξ$ is the noise in the measurements. Importantly, we have the additional constraint that the unknown signal vector ${w}^*$ is sparse: it has $k$ non-zero entries where $k$ is much smaller than the ambient dimension. Our goal is to output a prediction vector $\widehat{w}$ that has small prediction error: $\frac{1}{N}\cdot \|{X} {w}^* - {X} \widehat{w}\|^2_2$. Information-theoretically, we know what is best possible in terms of measurements: under most natural noise distributions, we can get prediction error at most $ε$ with roughly $N = O(k \log d/ε)$ samples. Computationally, this currently needs $d^{Ω(k)}$ run-time. Alternately, with $N = O(d)$, we can get polynomial-time. Thus, there is an exponential gap (in the dependence on $d$) between the two and we do not know if it is possible to get $d^{o(k)}$ run-time and $o(d)$ samples. We give the first generic positive result for worst-case design matrices ${X}$: For any ${X}$, we show that if the support of ${w}^*$ is chosen at random, we can get prediction error $ε$ with $N = \text{poly}(k, \log d, 1/ε)$ samples and run-time $\text{poly}(d,N)$. This run-time holds for any design matrix ${X}$ with condition number up to $2^{\text{poly}(d)}$. Previously, such results were known for worst-case ${w}^*$, but only for random design matrices from well-behaved families, matrices that have a very low condition number ($\text{poly}(\log d)$; e.g., as studied in compressed sensing), or those with special structural properties.

LGOct 24, 2022
Learning and Covering Sums of Independent Random Variables with Unbounded Support

Alkis Kalavasis, Konstantinos Stavropoulos, Manolis Zampetakis

We study the problem of covering and learning sums $X = X_1 + \cdots + X_n$ of independent integer-valued random variables $X_i$ (SIIRVs) with unbounded, or even infinite, support. De et al. at FOCS 2018, showed that the maximum value of the collective support of $X_i$'s necessarily appears in the sample complexity of learning $X$. In this work, we address two questions: (i) Are there general families of SIIRVs with unbounded support that can be learned with sample complexity independent of both $n$ and the maximal element of the support? (ii) Are there general families of SIIRVs with unbounded support that admit proper sparse covers in total variation distance? As for question (i), we provide a set of simple conditions that allow the unbounded SIIRV to be learned with complexity $\text{poly}(1/ε)$ bypassing the aforementioned lower bound. We further address question (ii) in the general setting where each variable $X_i$ has unimodal probability mass function and is a different member of some, possibly multi-parameter, exponential family $\mathcal{E}$ that satisfies some structural properties. These properties allow $\mathcal{E}$ to contain heavy tailed and non log-concave distributions. Moreover, we show that for every $ε> 0$, and every $k$-parameter family $\mathcal{E}$ that satisfies some structural assumptions, there exists an algorithm with $\tilde{O}(k) \cdot \text{poly}(1/ε)$ samples that learns a sum of $n$ arbitrary members of $\mathcal{E}$ within $ε$ in TV distance. The output of the learning algorithm is also a sum of random variables whose distribution lies in the family $\mathcal{E}$. En route, we prove that any discrete unimodal exponential family with bounded constant-degree central moments can be approximated by the family corresponding to a bounded subset of the initial (unbounded) parameter space.

55.8DSMay 17
Iterative Chow Filtering for Learning with Distribution Shift

Gautam Chandrasekaran, Georgios Gkrinias, Adam R. Klivans et al.

Recent work due to Goel et al. gave the first efficient algorithms for learning with distribution shift in the challenging PQ framework. In this setting, a learner receives labeled training examples, unlabeled test examples, and must make correct predictions on the test set but is allowed to abstain from predicting on out-of-distribution points. Their results rely on ${\cal L}_2$ sandwiching approximations, a strong requirement that leads to poor bounds for several basic function classes such as DNF formulas. Here, we show that the weaker notion of ${\cal L}_1$ sandwiching suffices for efficient PQ learning. As a consequence, we obtain the first quasipolynomial-time PQ learning algorithm for DNFs under the uniform distribution and essentially match the guarantees known for ordinary PAC learning. More broadly, our bounds provide exponential improvements for several classes including constant depth circuits and constant degree polynomial threshold functions. Our main technical ingredient is Iterative Chow Filtering, a new procedure that uses low-degree Chow parameters to identify and remove test points incompatible with the training distribution.

52.5DSMay 7
Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift

Adam R. Klivans, Shyamal Patel, Konstantinos Stavropoulos et al.

Recent work on provably efficient algorithms for learning with distribution shift has focused on two models: PQ learning (Goldwasser et al. (2020)) and TDS learning (Klivans et al. (2024)). Algorithms for TDS learning are allowed to reject a test set entirely if distribution shift is detected. In contrast, PQ learners may only reject points that are deemed out-of-distribution on an individual basis. Our main result is a surprising equivalence between these two models in the distribution-free setting. In particular, we give an efficient black-box reduction from PQ learning to TDS learning for any Boolean concept class. This equivalence implies the first hardness results for distribution-free TDS learning of basic classes such as halfspaces. The main technical contribution underlying our equivalence is a method for boosting, via branching programs, the weak distinguishing power of TDS learners that have rejected the target domain. We also show that giving a learner access to membership queries sidesteps these hardness results and allows for efficient, distribution-free PQ learnability of halfspaces. Our algorithm iteratively recovers large-margin separators obtained by applying successive Forster transforms on the training data.

DSApr 2, 2024
Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds

Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution $\mathcal{D}$, unlabeled samples from test distribution $\mathcal{D}'$, and the goal is to output a classifier with low error on $\mathcal{D}'$ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on $\mathcal{D}'$. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a $2^{(k/ε)^{O(1)}} \mathsf{poly}(d)$-time algorithm for TDS learning intersections of $k$ homogeneous halfspaces to accuracy $ε$ (prior work achieved $d^{(k/ε)^{O(1)}}$). We work under the mild assumption that the Gaussian training distribution contains at least an $ε$ fraction of both positive and negative examples ($ε$-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the $ε$-balanced assumption is necessary for $\mathsf{poly}(d,1/ε)$-time TDS learning for a single halfspace and (2) a $d^{\tildeΩ(\log 1/ε)}$ lower bound for the intersection of two general halfspaces, even with the $ε$-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature.

LGJan 16, 2025
Learning Noisy Halfspaces with a Margin: Massart is No Harder than Random

Gautam Chandrasekaran, Vasilis Kontonis, Konstantinos Stavropoulos et al.

We study the problem of PAC learning $γ$-margin halfspaces with Massart noise. We propose a simple proper learning algorithm, the Perspectron, that has sample complexity $\widetilde{O}((εγ)^{-2})$ and achieves classification error at most $η+ε$ where $η$ is the Massart noise rate. Prior works [DGT19,CKMY20] came with worse sample complexity guarantees (in both $ε$ and $γ$) or could only handle random classification noise [DDK+23,KIT+23] -- a much milder noise assumption. We also show that our results extend to the more challenging setting of learning generalized linear models with a known link function under Massart noise, achieving a similar sample complexity to the halfspace case. This significantly improves upon the prior state-of-the-art in this setting due to [CKMY20], who introduced this model.

LGJan 15, 2025
Testing Noise Assumptions of Learning Algorithms

Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos et al.

We pose a fundamental question in computational learning theory: can we efficiently test whether a training set satisfies the assumptions of a given noise model? This question has remained unaddressed despite decades of research on learning in the presence of noise. In this work, we show that this task is tractable and present the first efficient algorithm to test various noise assumptions on the training data. To model this question, we extend the recently proposed testable learning framework of Rubinfeld and Vasilyan (2023) and require a learner to run an associated test that satisfies the following two conditions: (1) whenever the test accepts, the learner outputs a classifier along with a certificate of optimality, and (2) the test must pass for any dataset drawn according to a specified modeling assumption on both the marginal distribution and the noise model. We then consider the problem of learning halfspaces over Gaussian marginals with Massart noise (where each label can be flipped with probability less than $1/2$ depending on the input features), and give a fully-polynomial time testable learning algorithm. We also show a separation between the classical setting of learning in the presence of structured noise and testable learning. In fact, for the simple case of random classification noise (where each label is flipped with fixed probability $η= 1/2$), we show that testable learning requires super-polynomial time while classical learning is trivial.

DSFeb 22, 2025
Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees

Gautam Chandrasekaran, Adam R. Klivans, Lin Lin Lee et al.

We give the first provably efficient algorithms for learning neural networks with distribution shift. We work in the Testable Learning with Distribution Shift framework (TDS learning) of Klivans et al. (2024), where the learner receives labeled examples from a training distribution and unlabeled examples from a test distribution and must either output a hypothesis with low test error or reject if distribution shift is detected. No assumptions are made on the test distribution. All prior work in TDS learning focuses on classification, while here we must handle the setting of nonconvex regression. Our results apply to real-valued networks with arbitrary Lipschitz activations and work whenever the training distribution has strictly sub-exponential tails. For training distributions that are bounded and hypercontractive, we give a fully polynomial-time algorithm for TDS learning one hidden-layer networks with sigmoid activations. We achieve this by importing classical kernel methods into the TDS framework using data-dependent feature maps and a type of kernel matrix that couples samples from both train and test distributions.

DSNov 6, 2024
Learning Constant-Depth Circuits in Malicious Noise Models

Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

The seminal work of Linial, Mansour, and Nisan gave a quasipolynomial-time algorithm for learning constant-depth circuits ($\mathsf{AC}^0$) with respect to the uniform distribution on the hypercube. Extending their algorithm to the setting of malicious noise, where both covariates and labels can be adversarially corrupted, has remained open. Here we achieve such a result, inspired by recent work on learning with distribution shift. Our running time essentially matches their algorithm, which is known to be optimal assuming various cryptographic primitives. Our proof uses a simple outlier-removal method combined with Braverman's theorem for fooling constant-depth circuits. We attain the best possible dependence on the noise rate and succeed in the harshest possible noise model (i.e., contamination or so-called "nasty noise").

LGNov 17, 2025
Efficient Calibration for Decision Making

Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar et al. · harvard

A decision-theoretic characterization of perfect calibration is that an agent seeking to minimize a proper loss in expectation cannot improve their outcome by post-processing a perfectly calibrated predictor. Hu and Wu (FOCS'24) use this to define an approximate calibration measure called calibration decision loss ($\mathsf{CDL}$), which measures the maximal improvement achievable by any post-processing over any proper loss. Unfortunately, $\mathsf{CDL}$ turns out to be intractable to even weakly approximate in the offline setting, given black-box access to the predictions and labels. We suggest circumventing this by restricting attention to structured families of post-processing functions $K$. We define the calibration decision loss relative to $K$, denoted $\mathsf{CDL}_K$ where we consider all proper losses but restrict post-processings to a structured family $K$. We develop a comprehensive theory of when $\mathsf{CDL}_K$ is information-theoretically and computationally tractable, and use it to prove both upper and lower bounds for natural classes $K$. In addition to introducing new definitions and algorithmic techniques to the theory of calibration for decision making, our results give rigorous guarantees for some widely used recalibration procedures in machine learning.

DSJun 13, 2024
Efficient Discrepancy Testing for Learning with Distribution Shift

Gautam Chandrasekaran, Adam R. Klivans, Vasilis Kontonis et al.

A fundamental notion of distance between train and test distributions from the field of domain adaptation is discrepancy distance. While in general hard to compute, here we provide the first set of provably efficient algorithms for testing localized discrepancy distance, where discrepancy is computed with respect to a fixed output classifier. These results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift (TDS learning) due to Klivans et al. (2023). Our approach generalizes and improves all prior work on TDS learning: (1) we obtain universal learners that succeed simultaneously for large classes of test distributions, (2) achieve near-optimal error rates, and (3) give exponential improvements for constant depth circuits. Our methods further extend to semi-parametric settings and imply the first positive results for low-dimensional convex sets. Additionally, we separate learning and testing phases and obtain algorithms that run in fully polynomial time at test time.

DSJun 4, 2024
Tolerant Algorithms for Learning with Arbitrary Covariate Shift

Surbhi Goel, Abhishek Shetty, Konstantinos Stavropoulos et al.

We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution. We focus on two frameworks: PQ learning [Goldwasser, A. Kalai, Y. Kalai, Montasser NeurIPS 2020], allowing abstention on adversarially generated parts of the test distribution, and TDS learning [Klivans, Stavropoulos, Vasilyan COLT 2024], permitting abstention on the entire test distribution if distribution shift is detected. All prior known algorithms either rely on learning primitives that are computationally hard even for simple function classes, or end up abstaining entirely even in the presence of a tiny amount of distribution shift. We address both these challenges for natural function classes, including intersections of halfspaces and decision trees, and standard training distributions, including Gaussians. For PQ learning, we give efficient learning algorithms, while for TDS learning, our algorithms can tolerate moderate amounts of distribution shift. At the core of our approach is an improved analysis of spectral outlier-removal techniques from learning with nasty noise. Our analysis can (1) handle arbitrarily large fraction of outliers, which is crucial for handling arbitrary distribution shifts, and (2) obtain stronger bounds on polynomial moments of the distribution after outlier removal, yielding new insights into polynomial regression under distribution shifts. Lastly, our techniques lead to novel results for tolerant testable learning [Rubinfeld and Vasilyan STOC 2023], and learning with nasty noise.

LGMay 19, 2023
Tester-Learners for Halfspaces: Universal Algorithms

Aravind Gollakota, Adam R. Klivans, Konstantinos Stavropoulos et al.

We give the first tester-learner for halfspaces that succeeds universally over a wide class of structured distributions. Our universal tester-learner runs in fully polynomial time and has the following guarantee: the learner achieves error $O(\mathrm{opt}) + ε$ on any labeled distribution that the tester accepts, and moreover, the tester accepts whenever the marginal is any distribution that satisfies a Poincaré inequality. In contrast to prior work on testable learning, our tester is not tailored to any single target distribution but rather succeeds for an entire target class of distributions. The class of Poincaré distributions includes all strongly log-concave distributions, and, assuming the Kannan--Lóvasz--Simonovits (KLS) conjecture, includes all log-concave distributions. In the special case where the label noise is known to be Massart, our tester-learner achieves error $\mathrm{opt} + ε$ while accepting all log-concave distributions unconditionally (without assuming KLS). Our tests rely on checking hypercontractivity of the unknown distribution using a sum-of-squares (SOS) program, and crucially make use of the fact that Poincaré distributions are certifiably hypercontractive in the SOS framework.

LGNov 2, 2020
Aggregating Incomplete and Noisy Rankings

Dimitris Fotakis, Alkis Kalavasis, Konstantinos Stavropoulos

We consider the problem of learning the true ordering of a set of alternatives from largely incomplete and noisy rankings. We introduce a natural generalization of both the classical Mallows model of ranking distributions and the extensively studied model of noisy pairwise comparisons. Our selective Mallows model outputs a noisy ranking on any given subset of alternatives, based on an underlying Mallows distribution. Assuming a sequence of subsets where each pair of alternatives appears frequently enough, we obtain strong asymptotically tight upper and lower bounds on the sample complexity of learning the underlying complete ranking and the (identities and the) ranking of the top-k alternatives from selective Mallows rankings. Moreover, building on the work of (Braverman and Mossel, 2009), we show how to efficiently compute the maximum likelihood complete ranking from selective Mallows rankings.