Claire Mathieu

4papers

716citations

Novelty50%

AI Score26

Ranked #166,496 of 205,806 authors (top 81%)#519 in DS (top 90%)

4 Papers

DSNov 5, 2018

How to aggregate Top-lists: Approximation algorithms via scores and average ranks

Claire Mathieu, Simon Mauras

A top-list is a possibly incomplete ranking of elements: only a subset of the elements are ranked, with all unranked elements tied for last. Top-list aggregation, a generalization of the well-known rank aggregation problem, takes as input a collection of top-lists and aggregates them into a single complete ranking, aiming to minimize the number of upsets (pairs ranked in opposite order in the input and in the output). In this paper, we give simple approximation algorithms for top-list aggregation. * We generalize the footrule algorithm for rank aggregation. * Using inspiration from approval voting, we define the score of an element as the frequency with which it is ranked, i.e. appears in an input top-list. We reinterpret Ailon's RepeatChoice algorithm for top-list aggregation using the score of an element and its average rank given that it is ranked. * Using average ranks, we generalize and analyze Borda's algorithm for rank aggregation. * We design a simple 2-phase variant of the Generalized Borda's algorithm, roughly sorting by scores and breaking ties by average ranks. * We then design another 2-phase variant in which in order to break ties we use, as a black box, the Mathieu-Schudy PTAS for rank aggregation, yielding a PTAS for top-list aggregation. * Finally, we discuss the special case in which all input lists have constant length.

DSJun 21, 2018

Instance-Optimality in the Noisy Value-and Comparison-Model --- Accept, Accept, Strong Accept: Which Papers get in?

Vincent Cohen-Addad, Frederik Mallmann-Trenn, Claire Mathieu

Motivated by crowdsourced computation, peer-grading, and recommendation systems, Braverman, Mao and Weinberg [STOC'16] studied the \emph{query} and \emph{round} complexity of fundamental problems such as finding the maximum (\textsc{max}), finding all elements above a certain value (\textsc{threshold-$v$}) or computing the top$-k$ elements (\textsc{Top}-$k$) in a noisy environment. For example, consider the task of selecting papers for a conference. This task is challenging due the crowdsourcing nature of peer reviews: the results of reviews are noisy and it is necessary to parallelize the review process as much as possible. We study the noisy value model and the noisy comparison model: In the \emph{noisy value model}, a reviewer is asked to evaluate a single element: "What is the value of paper $i$?" (\eg accept). In the \emph{noisy comparison model} (introduced in the seminal work of Feige, Peleg, Raghavan and Upfal [SICOMP'94]) a reviewer is asked to do a pairwise comparison: "Is paper $i$ better than paper $j$?" In this paper, we show optimal worst-case query complexity for the \textsc{max},\textsc{threshold-$v$} and \textsc{Top}-$k$ problems. For \textsc{max} and \textsc{Top}-$k$, we obtain optimal worst-case upper and lower bounds on the round vs query complexity in both models. For \textsc{threshold}-$v$, we obtain optimal query complexity and nearly-optimal round complexity, where $k$ is the size of the output) for both models. We then go beyond the worst-case and address the question of the importance of knowledge of the instance by providing, for a large range of parameters, instance-optimal algorithms with respect to the query complexity. Furthermore, we show that the value model is strictly easier than the comparison model.

DSApr 7, 2017

Hierarchical Clustering: Objective Functions and Algorithms

Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn et al.

Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on providing algorithms, rather than optimizing a specific objective, Dasgupta framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a `good' hierarchical clustering is one that minimizes some cost function. He showed that this cost function has certain desirable properties. We take an axiomatic approach to defining `good' objective functions for both similarity and dissimilarity-based hierarchical clustering. We characterize a set of "admissible" objective functions (that includes Dasgupta's one) that have the property that when the input admits a `natural' hierarchical clustering, it has an optimal value. Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better algorithms. For similarity-based hierarchical clustering, Dasgupta showed that the divisive sparsest-cut approach achieves an $O(\log^{3/2} n)$-approximation. We give a refined analysis of the algorithm and show that it in fact achieves an $O(\sqrt{\log n})$-approx. (Charikar and Chatziafratis independently proved that it is a $O(\sqrt{\log n})$-approx.). This improves upon the LP-based $O(\log n)$-approx. of Roy and Pokutta. For dissimilarity-based hierarchical clustering, we show that the classic average-linkage algorithm gives a factor 2 approx., and provide a simple and better algorithm that gives a factor 3/2 approx.. Finally, we consider `beyond-worst-case' scenario through a generalisation of the stochastic block model for hierarchical clustering. We show that Dasgupta's cost function has desirable properties for these inputs and we provide a simple 1 + o(1)-approximation in this setting.

DSAug 16, 2016

Optimization of Bootstrapping in Circuits

Fabrice Benhamouda, Tancrède Lepoint, Claire Mathieu et al.

In 2009, Gentry proposed the first Fully Homomorphic Encryption (FHE) scheme, an extremely powerful cryptographic primitive that enables to perform computations, i.e., to evaluate circuits, on encrypted data without decrypting them first. This has many applications, in particular in cloud computing. In all currently known FHE schemes, encryptions are associated to some (non-negative integer) noise level, and at each evaluation of an AND gate, the noise level increases. This is problematic because decryption can only work if the noise level stays below some maximum level $L$ at every gate of the circuit. To ensure that property, it is possible to perform an operation called \emph{bootstrapping} to reduce the noise level. However, bootstrapping is time-consuming and has been identified as a critical operation. This motivates a new problem in discrete optimization, that of choosing where in the circuit to perform bootstrapping operations so as to control the noise level; the goal is to minimize the number of bootstrappings in circuits. In this paper, we formally define the \emph{bootstrap problem}, we design a polynomial-time $L$-approximation algorithm using a novel method of rounding of a linear program, and we show a matching hardness result: $(L-ε)$-inapproximability for any $ε>0$.