AIMar 28, 2022
A Metaheuristic Algorithm for Large Maximum Weight Independent Set ProblemsYuanyuan Dong, Andrew V. Goldberg, Alexander Noe et al.
Motivated by a real-world vehicle routing application, we consider the maximum-weight independent set problem: Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum. Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges. To solve instances of this size, we develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search (GRASP) framework. This algorithm, which we call METAMIS, uses a wider range of simple local search operations than previously described in the literature. We introduce data structures that make these operations efficient. A new variant of path-relinking is introduced to escape local optima and so is a new alternating augmenting-path local search move that improves algorithm performance. We compare an implementation of our algorithm with a state-of-the-art openly available code on public benchmark sets, including some large instances with hundreds of millions of vertices. Our algorithm is, in general, competitive and outperforms this openly available code on large vehicle routing instances. We hope that our results will lead to even better MWIS algorithms.
LGMar 2, 2022
Near-Optimal Correlation Clustering with PrivacyVincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi et al.
Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labelling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our approximation guarantees are stronger than those shown in prior work and are optimal up to logarithmic factors.
CGSep 28, 2023
Multi-Swap $k$-Means++Lorenzo Beretta, Vincent Cohen-Addad, Silvio Lattanzi et al.
The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the $k$-means++ sampling distribution to yield a $c$-approximation to the $k$-means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.
CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic CapabilitiesGheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
DSApr 30
Computing the (k+2)-Edge-Connected Components in k-Edge-Connected Digraphs in Subquadratic TimeLoukas Georgiadis, Evangelos Kipouridis, Evangelos Kosinas et al.
Computing edge-connected components in directed and undirected graphs is a fundamental and well-studied problem in graph algorithms. In a very recent breakthrough, Korhonen [STOC 2025] showed that for any fixed $k$, the $k$-edge connected components of an undirected graph can be computed in linear time. In contrast, the directed case remains significantly more challenging: linear-time algorithms are only known for $k \le 3$, and for any fixed $k > 3$, the best known bound for sparse or moderately dense graphs is still the $O(mn)$-time algorithm of Nagamochi and Watanabe (1993). In this paper, we break the $O(mn)$ barrier for all $k = o(n^{1/4}/\sqrt{\log{n}})$. We present a randomized algorithm that computes the $(k+2)$-edge-connected components of a $k$-edge-connected directed graph in $O(k^2 m \sqrt{n} \log n)$ time, for any~$k$. This constitutes the first improvement over the classic Nagamochi--Watanabe bound for any constant $k > 3$. Our approach introduces new structural insights into directed edge-cuts and combines these with both new and existing techniques. A central contribution of our work is a substantial simplification and generalization of the framework introduced in~\cite{GKPP:3ECC}, which achieved an $\widetilde{O}(m\sqrt{m})$ bound for computing the $3$-edge-connected components of a digraph. In addition, we develop a variant of our algorithm that achieves the same $O(m \sqrt{n} \log n)$ running time for computing the $4$-edge-connected components of a \emph{general} directed graph.
DCJul 14, 2025
Large-Scale Graph Building in Dynamic Environments: Low Latency and High QualityFilipe Miguel Gonçalves de Almeida, CJ Carey, Hendrik Fichtenberger et al.
Learning and constructing large-scale graphs has attracted attention in recent decades, resulting in a rich literature that introduced various systems, tools, and algorithms. Grale is one of such tools that is designed for offline environments and is deployed in more than 50 different industrial settings at Google. Grale is widely applicable because of its ability to efficiently learn and construct a graph on datasets with multiple types of features. However, it is often the case that applications require the underlying data to evolve continuously and rapidly and the updated graph needs to be available with low latency. Such setting make the use of Grale prohibitive. While there are Approximate Nearest Neighbor (ANN) systems that handle dynamic updates with low latency, they are mostly limited to similarities over a single embedding. In this work, we introduce a system that inherits the advantages and the quality of Grale, and maintains a graph construction in a dynamic setting with tens of milliseconds of latency per request. We call the system Dynamic Grale Using ScaNN (Dynamic GUS). Our system has a wide range of applications with over 10 deployments at Google. One of the applications is in Android Security and Privacy, where Dynamic Grale Using ScaNN enables capturing harmful applications 4 times faster, before they can reach users.
DSJun 13, 2024
Dynamic Correlation Clustering in Sublinear Update TimeVincent Cohen-Addad, Silvio Lattanzi, Andreas Maggiori et al.
We study the classic problem of correlation clustering in dynamic node streams. In this setting, nodes are either added or randomly deleted over time, and each node pair is connected by a positive or negative edge. The objective is to continuously find a partition which minimizes the sum of positive edges crossing clusters and negative edges within clusters. We present an algorithm that maintains an $O(1)$-approximation with $O$(polylog $n$) amortized update time. Prior to our work, Behnezhad, Charikar, Ma, and L. Tan achieved a $5$-approximation with $O(1)$ expected update time in edge streams which translates in node streams to an $O(D)$-update time where $D$ is the maximum possible degree. Finally we complement our theoretical analysis with experiments on real world data.
DSJun 15, 2021
Correlation Clustering in Constant Many Parallel RoundsVincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović et al.
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental analysis of our techniques.
LGJun 4, 2018
Online Reciprocal Recommendation with Theoretical Performance GuaranteesFabio Vitale, Nikos Parotsidis, Claudio Gentile
A reciprocal recommendation problem is one where the goal of learning is not just to predict a user's preference towards a passive item (e.g., a book), but to recommend the targeted user on one side another user from the other side such that a mutual interest between the two exists. The problem thus is sharply different from the more traditional items-to-users recommendation, since a good match requires meeting the preferences of both users. We initiate a rigorous theoretical investigation of the reciprocal recommendation task in a specific framework of sequential learning. We point out general limitations, formulate reasonable assumptions enabling effective learning and, under these assumptions, we design and analyze a computationally efficient algorithm that uncovers mutual likes at a pace comparable to those achieved by a clearvoyant algorithm knowing all user preferences in advance. Finally, we validate our algorithm against synthetic and real-world datasets, showing improved empirical performance over simple baselines.