20.9DSApr 11
On the Approximability of Max-Cut on 3-Colorable Graphs and Graphs with Large Independent SetsSuprovat Ghoshal, Neng Huang, Euiwoong Lee et al.
Max-Cut is a classical graph-partitioning problem where given a graph $G = (V,E)$, the objective is to find a cut $(S,S^c)$ which maximizes the number of edges crossing the cut. In a seminal work, Goemans and Williamson gave an $α_{GW} \approx 0.87856$-factor approximation algorithm for the problem, which was later shown to be tight by the work of Khot, Kindler, Mossel, and O'Donnell. Since then, there has been a steady progress in understanding the approximability at even finer levels, and a fundamental goal in this context is to understand how the structure of the underlying graph affects the approximability of the Max-Cut problem. In this work, we investigate this question by exploring how the chromatic structure of a graph affects the Max-Cut problem. While it is well-known that Max-Cut can be solved perfectly and near-perfectly in $2$-colorable and almost $2$-colorable graphs in polynomial time, here we explore its approximability under much weaker structural conditions such as when the graph is $3$-colorable or contains a large independent set. Our main contributions in this context are as follows: 1. We show Max-Cut is $α_{GW}$-hard to approximate for $3$-colorable graphs. 2. We identify a natural threshold $α^*$ such that the following holds. Firstly, for graphs which contain an independent set of size up to $α^*$, Max-Cut continues to be $α_{GW}$-factor hard to approximate. Furthermore, for any graph that contains an independent set of size $> α^*$, there exists an efficient $>α_{GW}$-approximation algorithm for Max-Cut. Our hardness results are derived using various analytical tools and novel variants of the Majority-Is-Stablest theorem, which might be of independent interest. Our algorithmic results are based on a novel SDP relaxation, which is then rounded and analyzed using interval arithmetic.
LGJan 24, 2025
Humanity's Last ExamLong Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
DSMay 16, 2024
A Polynomial-Time Approximation for Pairwise Fair $k$-Median ClusteringSayan Bandyapadhyay, Eden Chlamtáč, Zachary Friggstad et al.
In this work, we study pairwise fair clustering with $\ell \ge 2$ groups, where for every cluster $C$ and every group $i \in [\ell]$, the number of points in $C$ from group $i$ must be at most $t$ times the number of points in $C$ from any other group $j \in [\ell]$, for a given integer $t$. To the best of our knowledge, only bi-criteria approximation and exponential-time algorithms follow for this problem from the prior work on fair clustering problems when $\ell > 2$. In our work, focusing on the $\ell > 2$ case, we design the first polynomial-time $O(k^2\cdot \ell \cdot t)$-approximation for this problem with $k$-median cost that does not violate the fairness constraints. We complement our algorithmic result by providing hardness of approximation results, which show that our problem even when $\ell=2$ is almost as hard as the popular uniform capacitated $k$-median, for which no polynomial-time algorithm with an approximation factor of $o(\log k)$ is known.
91.2DSMar 31
Approximation algorithms for satisfiable and nearly satisfiable ordering CSPsYury Makarychev
We study approximation algorithms for satisfiable and nearly satisfiable instances of ordering constraint satisfaction problems (ordering CSPs). Ordering CSPs arise naturally in ranking and scheduling, yet their approximability remains poorly understood beyond a few isolated cases. We introduce a general framework for designing approximation algorithms for ordering CSPs. The framework relaxes an input instance to an auxiliary ordering CSP, solves the relaxation, and then applies a randomized transformation to obtain an ordering for the original instance. This reduces the search for approximation algorithms to an optimization problem over randomized transformations. Our main technical contribution is to show that the power of this framework is captured by a structured class of transformations, which we call strong IDU transformations: every transformation used in the framework can be replaced by a strong IDU transformation without weakening the resulting approximation guarantee. We then classify strong IDU transformations and show that optimizing over them reduces to an explicit optimization problem whose dimension depends only on the maximum predicate arity $k$ and the desired precision $δ> 0$. As a consequence, for any finite ordering constraint language, we can compute a strong IDU transformation whose guarantee is within $δ$ of the best guarantee achievable by the framework, in time depending only on $k$ and $δ$. The framework applies broadly and yields nontrivial approximation guarantees for a wide class of ordering predicates.
DSFeb 3, 2022
Fair Representation Clustering with Several Protected ClassesZhen Dai, Yury Makarychev, Ali Vakilian
We study the problem of fair $k$-median where each cluster is required to have a fair representation of individuals from different groups. In the fair representation $k$-median problem, we are given a set of points $X$ in a metric space. Each point $x\in X$ belongs to one of $\ell$ groups. Further, we are given fair representation parameters $α_j$ and $β_j$ for each group $j\in [\ell]$. We say that a $k$-clustering $C_1, \cdots, C_k$ fairly represents all groups if the number of points from group $j$ in cluster $C_i$ is between $α_j |C_i|$ and $β_j |C_i|$ for every $j\in[\ell]$ and $i\in [k]$. The goal is to find a set $\mathcal{C}$ of $k$ centers and an assignment $φ: X\rightarrow \mathcal{C}$ such that the clustering defined by $(\mathcal{C}, φ)$ fairly represents all groups and minimizes the $\ell_1$-objective $\sum_{x\in X} d(x, φ(x))$. We present an $O(\log k)$-approximation algorithm that runs in time $n^{O(\ell)}$. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both $k$ and $\ell$. We also consider an important special case of the problem where $α_j = β_j = \frac{f_j}{f}$ and $f_j, f \in \mathbb{N}$ for all $j\in [\ell]$. For this special case, we present an $O(\log k)$-approximation algorithm that runs in $(kf)^{O(\ell)}\log n + poly(n)$ time.
DSNov 8, 2021
Approximating Fair Clustering with Cascaded Norm ObjectivesEden Chlamtáč, Yury Makarychev, Ali Vakilian
We introduce the $(p,q)$-Fair Clustering problem. In this problem, we are given a set of points $P$ and a collection of different weight functions $W$. We would like to find a clustering which minimizes the $\ell_q$-norm of the vector over $W$ of the $\ell_p$-norms of the weighted distances of points in $P$ from the centers. This generalizes various clustering problems, including Socially Fair $k$-Median and $k$-Means, and is closely connected to other problems such as Densest $k$-Subgraph and Min $k$-Union. We utilize convex programming techniques to approximate the $(p,q)$-Fair Clustering problem for different values of $p$ and $q$. When $p\geq q$, we get an $O(k^{(p-q)/(2pq)})$, which nearly matches a $k^{Ω((p-q)/(pq))}$ lower bound based on conjectured hardness of Min $k$-Union and other problems. When $q\geq p$, we get an approximation which is independent of the size of the input for bounded $p,q$, and also matches the recent $O((\log n/(\log\log n))^{1/p})$-approximation for $(p, \infty)$-Fair Clustering by Makarychev and Vakilian (COLT 2021).
DSAug 11, 2021
Local Correlation Clustering with Asymmetric Classification ErrorsJafar Jafarov, Sanchit Kalhan, Konstantin Makarychev et al.
In the Correlation Clustering problem, we are given a complete weighted graph $G$ with its edges labeled as "similar" and "dissimilar" by a noisy binary classifier. For a clustering $\mathcal{C}$ of graph $G$, a similar edge is in disagreement with $\mathcal{C}$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $\mathcal{C}$ if its endpoints belong to the same cluster. The disagreements vector, $\text{dis}$, is a vector indexed by the vertices of $G$ such that the $v$-th coordinate $\text{dis}_v$ equals the weight of all disagreeing edges incident on $v$. The goal is to produce a clustering that minimizes the $\ell_p$ norm of the disagreements vector for $p\geq 1$. We study the $\ell_p$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in the range of $[α\mathbf{w},\mathbf{w}]$ and every dissimilar edge has weight at least $α\mathbf{w}$ (where $α\leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give an $O\left((\frac{1}α)^{\frac{1}{2}-\frac{1}{2p}}\cdot \log\frac{1}α\right)$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.
DSAug 11, 2021
Correlation Clustering with Asymmetric Classification ErrorsJafar Jafarov, Sanchit Kalhan, Konstantin Makarychev et al.
In the Correlation Clustering problem, we are given a weighted graph $G$ with its edges labeled as "similar" or "dissimilar" by a binary classifier. The goal is to produce a clustering that minimizes the weight of "disagreements": the sum of the weights of "similar" edges across clusters and "dissimilar" edges within clusters. We study the correlation clustering problem under the following assumption: Every "similar" edge $e$ has weight $\mathbf{w}_e\in[α\mathbf{w}, \mathbf{w}]$ and every "dissimilar" edge $e$ has weight $\mathbf{w}_e\geq α\mathbf{w}$ (where $α\leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give a $(3 + 2 \log_e (1/α))$ approximation algorithm for this problem. This assumption captures well the scenario when classification errors are asymmetric. Additionally, we show an asymptotically matching Linear Programming integrality gap of $Ω(\log 1/α)$.
DSMar 3, 2021
Approximation Algorithms for Socially Fair ClusteringYury Makarychev, Ali Vakilian
We present an $(e^{O(p)} \frac{\log \ell}{\log\log\ell})$-approximation algorithm for socially fair clustering with the $\ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$-medians, $k$-means, or, more generally, $\ell_p$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group }j} d(u,C)^p$. The socially fair clustering problem was independently proposed by Ghadiri, Samadi, and Vempala [2021] and Abbasi, Bhaskara, and Venkatasubramanian [2021]. Our algorithm improves and generalizes their $O(\ell)$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $Ω(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $Θ(\frac{\log \ell}{\log\log\ell})$ for a fixed $p$. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. [2021].
LGMay 28, 2019
Efficient Kirszbraun Extension with Applications to RegressionHanan Zaichyk, Armin Biess, Aryeh Kontorovich et al.
We introduce a framework for performing regression between two Hilbert spaces. This is done based on Kirszbraun's extension theorem, to the best of our knowledge, the first application of this technique to supervised learning. We analyze the statistical and computational aspects of this method. We decompose this task into two stages: training (which corresponds operationally to smoothing/regularization) and prediction (which is achieved via Kirszbraun extension). Both are solved algorithmically via a novel multiplicative weight updates (MWU) scheme, which, for our problem formulation, achieves a quadratic runtime improvement over the state of the art. Our empirical results indicate a dramatic improvement over standard off-the-shelf solvers in our setting.
DSNov 8, 2018
Nonlinear Dimension Reduction via Outer Bi-Lipschitz ExtensionsSepideh Mahabadi, Konstantin Makarychev, Yury Makarychev et al.
We introduce and study the notion of an outer bi-Lipschitz extension of a map between Euclidean spaces. The notion is a natural analogue of the notion of a Lipschitz extension of a Lipschitz map. We show that for every map $f$ there exists an outer bi-Lipschitz extension $f'$ whose distortion is greater than that of $f$ by at most a constant factor. This result can be seen as a counterpart of the classic Kirszbraun theorem for outer bi-Lipschitz extensions. We also study outer bi-Lipschitz extensions of near-isometric maps and show upper and lower bounds for them. Then, we present applications of our results to prioritized and terminal dimension reduction problems. * We prove a prioritized variant of the Johnson-Lindenstrauss lemma: given a set of points $X\subset \mathbb{R}^d$ of size $N$ and a permutation ("priority ranking") of $X$, there exists an embedding $f$ of $X$ into $\mathbb{R}^{O(\log N)}$ with distortion $O(\log \log N)$ such that the point of rank $j$ has only $O(\log^{3 + \varepsilon} j)$ non-zero coordinates - more specifically, all but the first $O(\log^{3+\varepsilon} j)$ coordinates are equal to $0$; the distortion of $f$ restricted to the first $j$ points (according to the ranking) is at most $O(\log\log j)$. The result makes a progress towards answering an open question by Elkin, Filtser, and Neiman about prioritized dimension reductions. * We prove that given a set $X$ of $N$ points in $\mathbb{R}^d$, there exists a terminal dimension reduction embedding of $\mathbb{R}^d$ into $\mathbb{R}^{d'}$, where $d' = O\left(\frac{\log N}{\varepsilon^4}\right)$, which preserves distances $\|x-y\|$ between points $x\in X$ and $y \in \mathbb{R}^{d}$, up to a multiplicative factor of $1 \pm \varepsilon$. This improves a recent result by Elkin, Filtser, and Neiman. The dimension reductions that we obtain are nonlinear, and this nonlinearity is necessary.
DSNov 8, 2018
Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians ClusteringKonstantin Makarychev, Yury Makarychev, Ilya Razenshteyn
Consider an instance of Euclidean $k$-means or $k$-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of $(1+\varepsilon)$ under a projection onto a random $O(\log(k / \varepsilon) / \varepsilon^2)$-dimensional subspace. Further, the cost of every clustering is preserved within $(1+\varepsilon)$. More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean $k$-clustering with the distances raised to the $p$-th power for any constant $p$. For $k$-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for $k$-medians, it answers a question raised by Kannan.
DSNov 10, 2015
Learning Communities in the Presence of ErrorsKonstantin Makarychev, Yury Makarychev, Aravindan Vijayaraghavan
We study the problem of learning communities in the presence of modeling errors and give robust recovery algorithms for the Stochastic Block Model (SBM). This model, which is also known as the Planted Partition Model, is widely used for community detection and graph partitioning in various fields, including machine learning, statistics, and social sciences. Many algorithms exist for learning communities in the Stochastic Block Model, but they do not work well in the presence of errors. In this paper, we initiate the study of robust algorithms for partial recovery in SBM with modeling errors or noise. We consider graphs generated according to the Stochastic Block Model and then modified by an adversary. We allow two types of adversarial errors, Feige---Kilian or monotone errors, and edge outlier errors. Mossel, Neeman and Sly (STOC 2015) posed an open question about whether an almost exact recovery is possible when the adversary is allowed to add $o(n)$ edges. Our work answers this question affirmatively even in the case of $k>2$ communities. We then show that our algorithms work not only when the instances come from SBM, but also work when the instances come from any distribution of graphs that is $εm$ close to SBM in the Kullback---Leibler divergence. This result also works in the presence of adversarial errors. Finally, we present almost tight lower bounds for two communities.
DSJun 22, 2014
Correlation Clustering with Noisy Partial InformationKonstantin Makarychev, Yury Makarychev, Aravindan Vijayaraghavan
In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value $(1+ δ) optcost + O_δ(n\log^3 n)$ with high probability, where $optcost$ is the value of the optimal solution (for every $δ> 0$). The second algorithm finds the ground truth clustering with an arbitrarily small classification error $η$ (under some additional assumptions on the instance).
DSJun 22, 2014
Constant Factor Approximation for Balanced Cut in the PIE modelKonstantin Makarychev, Yury Makarychev, Aravindan Vijayaraghavan
We propose and study a new semi-random semi-adversarial model for Balanced Cut, a planted model with permutation-invariant random edges (PIE). Our model is much more general than planted models considered previously. Consider a set of vertices V partitioned into two clusters $L$ and $R$ of equal size. Let $G$ be an arbitrary graph on $V$ with no edges between $L$ and $R$. Let $E_{random}$ be a set of edges sampled from an arbitrary permutation-invariant distribution (a distribution that is invariant under permutation of vertices in $L$ and in $R$). Then we say that $G + E_{random}$ is a graph with permutation-invariant random edges. We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost $O(|E_{random}|) + n \text{polylog}(n)$ in this model. In the regime when $|E_{random}| = Ω(n \text{polylog}(n))$, this is a constant factor approximation with respect to the cost of the planted cut.
LGMay 13, 2014
Clustering, Hamming Embedding, Generalized LSH and the Max NormBehnam Neyshabur, Yury Makarychev, Nathan Srebro
We study the convex relaxation of clustering and hamming embedding, focusing on the asymmetric case (co-clustering and asymmetric hamming embedding), understanding their relationship to LSH as studied by (Charikar 2002) and to the max-norm ball, and the differences between their symmetric and asymmetric versions.
LGNov 29, 2013
The Power of Asymmetry in Binary HashingBehnam Neyshabur, Payman Yadollahpour, Yury Makarychev et al.
When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between $x$ and $x'$ as the hamming distance between $f(x)$ and $g(x')$, for two distinct binary codes $f,g$, rather than as the hamming distance between $f(x)$ and $f(x')$.