Shuyi Yan

4papers

6citations

Novelty57%

AI Score49

Ranked #46,541 of 205,806 authors (top 23%)#115 in DS (top 20%)

4 Papers

77.7DSMay 11

Static to Dynamic Correlation Clustering

Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee et al.

Correlation clustering is a well-studied problem, first proposed by Bansal, Blum, and Chawla [Mach. Learn. '04]. The input is an unweighted, undirected graph. The problem is to cluster the vertices so as to minimize the number of edges between vertices in different clusters and missing edges between vertices inside the same cluster. This problem has a wide application in data mining and machine learning. We introduce a general framework that transforms existing static correlation clustering algorithms into fully-dynamic ones that work against an adaptive adversary. We show how to apply our framework to known efficient correlation clustering algorithms, starting from the classic 3-approximate Pivot algorithm from Ailon, Charikar and Newman [JACM'08]. Applied to the most recent sublinear $1.485$-approximation algorithm from Cao, Cohen-Addad, Lee, Li, Lolck, Newman, Thorup, Vogl, Yan and Zhang [STOC'25], we get a $1.485$-approximation fully-dynamic algorithm that works with worst-case constant update time. The original static algorithm gets its approximation factor with constant probability, and we get the same against an adaptive adversary in the sense that for any given update step, not known to our algorithm, our solution is a $1.485$-approximation with constant probability when we reach this update. Most of previous dynamic algorithms, including the celebrated result from Behnezhad, Charikar, Ma and Tan [FOCS'19], had approximation factors around $3$ in expectation, and they could only handle an oblivious adversary. A recent algorithm by Braverman, Dharangutte, Pai, Shah, and Wang [AISTATS'25] could handle an adaptive adversary, but it has a large unspecified constant approximation ratio. This contrasts with our general transformation, which works with all the best approximation factors known for the static case.

6.9DSMay 18

Estimating Random-Walk Probabilities in Directed Graphs

Christian Bertram, Mads Vestergaard Jensen, Mikkel Thorup et al.

We study discounted random walks in directed graphs. In each step, the walk either terminates with a constant probability $α$, or proceeds to a random out-neighbor. Our goal is to estimate the probability $π(s, t)$ that a discounted random walk starting from $s$ terminates at $t$. This probability is also known as the Personalized PageRank (PPR) score, which measures the relevance of $t$ to $s$, for instance, when $s$ and $t$ are web pages on the Internet. We aim to estimate $π(s, t)$ within a constant relative error with constant probability. A variety of algorithms have been developed for several problem variants, such as single-pair, single-source, single-target, and single-node estimation, under both worst-case and average-case settings, and for different combinations of allowed graph queries. However, in many important cases, there remain polynomial gaps between known upper and lower bounds. In this paper, we establish tight upper and lower bounds (up to logarithmic factors of $n$) for all problem variants and query combinations, closing all existing gaps in both the worst-case and average-case settings. Below we give some examples for the worst-case settings. As an upper-bound example, the classic power method estimates $π(s,t)$ if it is above a threshold $δ$ in time $O(m\log(1/δ))$ but $π(s,t)$ can be as small as $1/n^{Θ(n)}$. For contrast, we propose algorithms that deterministically estimate arbitrarily small $π(s,t)$ in $O(m\log n)$ time. As a lower-bound example, we improve the lower bound for the single-pair problem from $Ω(\min\{n,1/δ\})$ to $Ω(\min\{m,1/δ\})$, which is optimal (up to logarithmic factors) since a simple Monte Carlo estimate takes $O(1/δ)$ time.

24.1DSMay 11

Edge-weighted Online Stochastic Matching Under Jaillet-Lu LP

Shuyi Yan

The online stochastic matching problem was introduced by [FMMM09], together with the $(1-\frac1e)$-competitive Suggested Matching algorithm. In the most general edge-weighted setting, this ratio has not been improved for more than one decade, until recently [Yan24] beat the $1-\frac1e$ bound and [QFZW23] further improved it to $0.650$. Both works measure the online competitiveness against the offline LP relaxation introduced by Jaillet and Lu [JL14]. The same LP has also played an important role in other settings as it is a natural choice for two-choice online algorithms. In this paper, we prove an upper bound of $0.663$ and a lower bound of $0.662$ for edge-weighted online stochastic matching under Jaillet-Lu LP. We propose a simple hard instance and identify the optimal online algorithm for this specific instance which has a competitive ratio of $<0.663$. Despite the simplicity of the instance, we then show that a near-optimal algorithm for it, which has a competitive ratio of $>0.662$, can be generalized to work on all instances without any loss. As our algorithm is generalized from a real near-optimal algorithm instead of manually combining trivial strategies, it has two natural advantages compared with previous works: (1) its matching strategy varies from time to time; (2) it utilizes global information about offline vertices. On the other hand, the upper bound suggests that more powerful LPs and multiple-choice strategies are needed if we want to further improve the ratio by $>0.001$. In addition to our main result, we also generalize the asymptotic equivalence between the Poisson arrival model and the original online stochastic matching established by [HS21], removing the requirement of approximate monotonicity for the online algorithm.

30.8DSMar 12

Pivot based correlation clustering in the presence of good clusters

David Rasmussen Lolck, Mikkel Thorup, Shuyi Yan

The classic pivot based clustering algorithm of Ailon, Charikar and Chawla [JACM'08] is factor 3, but all concrete examples showing that it is no better than 3 are based on some very good clusters, e.g., a complete graph minus a matching. By removing all good clusters before we make each pivot step, we show that this improves the approximation ratio to $2.9991$. To aid in this, we also show how our proposed algorithm performs on synthetic datasets, where the algorithm performs remarkably well, and shows improvements over both the algorithm for locating good clusters and the classic pivot algorithm.