Shiri Chechik

DS
5papers
29citations
Novelty48%
AI Score45

5 Papers

DSMar 29
Girth Approximations in the CONGEST Model

Shiri Chechik, Gur Lifshitz, Doron Mukhtar

This paper advances the state of the art in girth approximation within the CONGEST model. Manoharan and Ramachandran [PODC '24] provided the first significant improvement in girth approximation in over a decade. We build on this momentum and make progress on all fronts: we provide a unified family of algorithms yielding girth approximation-round tradeoffs for undirected networks; we obtain improved bounds for directed networks; and we establish better lower bounds for directed and undirected weighted networks. Together, these results substantially narrow the remaining complexity gaps across all settings. Specifically, for networks with $n$ nodes and hop-diameter $D$, we show that one can compute, with high probability: $(1)$ An $f$-approximation for unweighted undirected girth in $\tilde{O}(n^{1/f}+D)$ rounds, for every constant integer $f>2$, $(2)$ A $(2k-1+o(1))$-approximation for weighted undirected girth in $\tilde{O}(n^{(k+1)/(2k+1)}+D)$ rounds, for every constant integer $k>1$, and $(3)$ A $2$-approximation for directed unweighted girth, and a $(2+\varepsilon)$-approximation for directed weighted girth, both in $\tilde{O}(n^{2/3}+D)$ rounds. We also prove new lower bounds for directed networks and for undirected weighted networks: for every integer $k > 2$ and $\varepsilon>0$, assuming the Erdős-Simonovits' even cycle conjecture (and unconditionally for $k\in\{3,4,6\}$), any $(k-\varepsilon)$-approximation for the girth requires $\tildeΩ(n^{k/(2k-1)})$ rounds, even when $D = O(\log n)$.

DSMay 8
Faster Deterministic Streaming Vertex Coloring

Shiri Chechik, Hongyi Chen, Tianyi Zhang

Graph coloring is a fundamental problem in computer science. In the semi-streaming model, an input graph $G$ on $n$ vertices and maximum degree $Δ$ is presented as a stream of edges, and the goal is to compute a vertex coloring using a small number of colors while storing only $\tilde{O}(n)$ bits of memory. Recent work has revealed an exponential separation between randomized and deterministic approaches in this setting: while randomized algorithms can achieve a $(Δ+1)$-coloring in a single pass [Assadi, Chen, and Khanna, 2019], any single-pass deterministic algorithm requires $\exp(Δ^{Ω(1)})$ colors [Assadi, Chen, and Sun, 2022]. Consequently, deterministic algorithms that use few colors must necessarily make multiple passes over the stream. Prior to this work, the best known deterministic trade-offs were: an $O(Δ^2)$-coloring in 2 passes, an $O(Δ)$-coloring in $O(\log Δ)$ passes [Assadi, Chen, and Sun, 2022], and a $(Δ+1)$-coloring in $O(\log Δ\cdot \log\log Δ)$ passes [Assadi, Chakrabarti, Ghosh, and Stoeckl, 2023]. It remained open whether better trade-offs -- particularly with sub-logarithmic pass complexity and linear-in-$Δ$ palette size -- were achievable. In this paper, we present a new deterministic semi-streaming algorithm that computes an $O(Δ)$-coloring in $O(\sqrt{\log Δ})$ passes. This is the first deterministic streaming algorithm to achieve a coloring with palette size linear-in-$Δ$ using sublogarithmic-in-$Δ$ passes.

DSApr 30
Simpler and Improved Replacement Path Coverings

Davide Bilò, Shiri Chechik, Keerti Choudhary et al.

An important tool in the design of fault-tolerant graph data structures are $(L,f)$-replacement path coverings (RPCs). An RPC is a family $\mathcal{G}$ of subgraphs of a given graph $G$ such that, for every set $F$ of at most $f$ edges, there is a subfamily $\mathcal{G}_F \,{\subseteq}\, \mathcal{G}$ with the following properties. (1) No subgraph in $\mathcal{G}_F$ contains an edge of $F$. (2) For each pair of vertices $s,t$ that have a shortest path in $G-F$ with at most $L$ edges, one such path also exists in some subgraph in $\mathcal{G}_F$. The covering value of the RPC is the total number $|\mathcal{G}|$ of subgraphs. The query time is the time needed to compute the subfamily $\mathcal{G}_F$ given the set $F$. Weimann and Yuster [TALG'13] devised a randomized RPC with covering value $\widetilde{O}(fL^f)$ and query time $\widetilde{O}(f^2 L^f)$. This was derandomized by Karthik and Parter [TALG'24], who also reduced the query time to $\widetilde{O}(f^2 L)$. Their approach uses some heavy algebraic machinery involving error-correcting codes and an increased covering value of $O((cfL \log n)^{f+1})$ for some constant $c > 1$. We instead devise a much simpler derandomization via conditional expectations that lowers the covering value back to $\widetilde{O}(fL^{f+o(1)})$ and decreases the query time to $\widetilde{O}(f^{5/2}L^{o(1)})$, assuming $f = o(\log L)$. We also investigate the optimal covering value of any $(L,f)$-replacement path covering (deterministic or randomized) for different parameter ranges. We provide a new randomized construction as well as improving a known lower bound, also by Karthik and Parter. For example, for $f = o(\log L)$, we give an RPC with $\widetilde{O}( (L/f)^f L^{o(1)})$ subgraphs and show that this is tight up to the $L^{o(1)}$ term.

LGJun 12, 2017
Clustering Small Samples with Quality Guarantees: Adaptivity with One2all pps

Edith Cohen, Shiri Chechik, Haim Kaplan

Clustering of data points is a fundamental tool in data analysis. We consider points $X$ in a relaxed metric space, where the triangle inequality holds within a constant factor. The {\em cost} of clustering $X$ by $Q$ is $V(Q)=\sum_{x\in X} d_{xQ}$. Two basic tasks, parametrized by $k \geq 1$, are {\em cost estimation}, which returns (approximate) $V(Q)$ for queries $Q$ such that $|Q|=k$ and {\em clustering}, which returns an (approximate) minimizer of $V(Q)$ of size $|Q|=k$. With very large data sets $X$, we seek efficient constructions of small samples that act as surrogates to the full data for performing these tasks. Existing constructions that provide quality guarantees are either worst-case, and unable to benefit from structure of real data sets, or make explicit strong assumptions on the structure. We show here how to avoid both these pitfalls using adaptive designs. At the core of our design is the {\em one2all} construction of multi-objective probability-proportional-to-size (pps) samples: Given a set $M$ of centroids and $α\geq 1$, one2all efficiently assigns probabilities to points so that the clustering cost of {\em each} $Q$ with cost $V(Q) \geq V(M)/α$ can be estimated well from a sample of size $O(α|M|ε^{-2})$. For cost queries, we can obtain worst-case sample size $O(kε^{-2})$ by applying one2all to a bicriteria approximation $M$, but we adaptively balance $|M|$ and $α$ to further reduce sample size. For clustering, we design an adaptive wrapper that applies a base clustering algorithm to a sample $S$. Our wrapper uses the smallest sample that provides statistical guarantees that the quality of the clustering on the sample carries over to the full data set. We demonstrate experimentally the huge gains of using our adaptive instead of worst-case methods.

SIMar 30, 2015
Average Distance Queries through Weighted Samples in Graphs and Metric Spaces: High Scalability with Tight Statistical Guarantees

Shiri Chechik, Edith Cohen, Haim Kaplan

The average distance from a node to all other nodes in a graph, or from a query point in a metric space to a set of points, is a fundamental quantity in data analysis. The inverse of the average distance, known as the (classic) closeness centrality of a node, is a popular importance measure in the study of social networks. We develop novel structural insights on the sparsifiability of the distance relation via weighted sampling. Based on that, we present highly practical algorithms with strong statistical guarantees for fundamental problems. We show that the average distance (and hence the centrality) for all nodes in a graph can be estimated using $O(ε^{-2})$ single-source distance computations. For a set $V$ of $n$ points in a metric space, we show that after preprocessing which uses $O(n)$ distance computations we can compute a weighted sample $S\subset V$ of size $O(ε^{-2})$ such that the average distance from any query point $v$ to $V$ can be estimated from the distances from $v$ to $S$. Finally, we show that for a set of points $V$ in a metric space, we can estimate the average pairwise distance using $O(n+ε^{-2})$ distance computations. The estimate is based on a weighted sample of $O(ε^{-2})$ pairs of points, which is computed using $O(n)$ distance computations. Our estimates are unbiased with normalized mean square error (NRMSE) of at most $ε$. Increasing the sample size by a $O(\log n)$ factor ensures that the probability that the relative error exceeds $ε$ is polynomially small.