Konstantin Avrachenkov

h-index37

25papers

261citations

Novelty46%

AI Score47

Ranked #30,920 of 194,257 authors (top 16%)#7,321 in LG (top 18%)

25 Papers

5.9SYApr 7, 2023

Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

Tejas Pagare, Vivek Borkar, Konstantin Avrachenkov

We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.

5.4PRApr 9

Planted clique recovery in random geometric graphs

Konstantin Avrachenkov, Andrei Bobu, Nelly Litvak et al.

We investigate the problem of identifying planted cliques in random geometric graphs, focusing on two distinct algorithmic approaches: the first based on vertex degrees (VD) and the other on common neighbors (CN). We analyze the performance of these methods under varying regimes of key parameters, namely the average degree of the graph and the size of the planted clique. We demonstrate that exact recovery is achieved with high probability as the graph size increases, in a specific set of parameters. Notably, our results reveal that the CN-algorithm significantly outperforms the VD-algorithm. In particular, in the connectivity regime, tiny planted cliques (even edges) are correctly identified by the CN-algorithm, yielding a significant impact on anomaly detection. Finally, our results are confirmed by a series of numerical experiments, showing that the devised algorithms are effective in practice.

1.2NANov 23, 2017

Hamiltonian System Approach to Distributed Spectral Decomposition in Networks

Konstantin Avrachenkov, Philippe Jacquet, Jithin Sreedharan

Because of the significant increase in size and complexity of the networks, the distributed computation of eigenvalues and eigenvectors of graph matrices has become very challenging and yet it remains as important as before. In this paper we develop efficient distributed algorithms to detect, with higher resolution, closely situated eigenvalues and corresponding eigenvectors of symmetric graph matrices. We model the system of graph spectral computation as physical systems with Lagrangian and Hamiltonian dynamics. The spectrum of Laplacian matrix, in particular, is framed as a classical spring-mass system with Lagrangian dynamics. The spectrum of any general symmetric graph matrix turns out to have a simple connection with quantum systems and it can be thus formulated as a solution to a Schrödinger-type differential equation. Taking into account the higher resolution requirement in the spectrum computation and the related stability issues in the numerical solution of the underlying differential equation, we propose the application of symplectic integrators to the calculation of eigenspectrum. The effectiveness of the proposed techniques is demonstrated with numerical simulations on real-world networks of different sizes and complexities.

1.2PFMar 10, 2017

Hitting Times in Markov Chains with Restart and their Application to Network Centrality

Konstantin Avrachenkov, Alexey Piunovskiy, Yi Zhang

Motivated by applications in telecommunications, computer scienceand physics, we consider a discrete-time Markov process withrestart. At each step the process eitherwith a positive probability restarts from a given distribution, orwith the complementary probability continues according to a Markovtransition kernel. The main contribution of the present work is thatwe obtain an explicit expression for the expectation of the hittingtime (to a given target set) of the process with restart.The formula is convenient when considering the problem of optimizationof the expected hitting time with respect to the restart probability.We illustrate our results with two examplesin uncountable and countable state spaces andwith an application to network centrality.

4.1LGNov 12, 2025

MARBLE: Multi-Armed Restless Bandits in Latent Markovian Environment

Mohsen Amiri, Konstantin Avrachenkov, Ibtihal El Mimouni et al.

Restless Multi-Armed Bandits (RMABs) are powerful models for decision-making under uncertainty, yet classical formulations typically assume fixed dynamics, an assumption often violated in nonstationary environments. We introduce MARBLE (Multi-Armed Restless Bandits in a Latent Markovian Environment), which augments RMABs with a latent Markov state that induces nonstationary behavior. In MARBLE, each arm evolves according to a latent environment state that switches over time, making policy learning substantially more challenging. We further introduce the Markov-Averaged Indexability (MAI) criterion as a relaxed indexability assumption and prove that, despite unobserved regime switches, under the MAI criterion, synchronous Q-learning with Whittle Indices (QWI) converges almost surely to the optimal Q-function and the corresponding Whittle indices. We validate MARBLE on a calibrated simulator-embedded (digital twin) recommender system, where QWI consistently adapts to a shifting latent state and converges to an optimal policy, empirically corroborating our theoretical findings.

6.4LGDec 17, 2024

Lagrangian Index Policy for Restless Bandits with Average Reward

Konstantin Avrachenkov, Vivek S. Borkar, Pratik Shah

We study the Lagrange Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrange index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give a new proof of asymptotic optimality in case of homogeneous arms as the number of arms goes to infinity, based on exchangeability and de Finetti's theorem.

1.2SIJul 27, 2025

Multi-Community Spectral Clustering for Geometric Graphs

Luiz Emilio Allem, Konstantin Avrachenkov, Carlos Hoppen et al.

In this paper, we consider the soft geometric block model (SGBM) with a fixed number $k \geq 2$ of homogeneous communities in the dense regime, and we introduce a spectral clustering algorithm for community recovery on graphs generated by this model. Given such a graph, the algorithm produces an embedding into $\mathbb{R}^{k-1}$ using the eigenvectors associated with the $k-1$ eigenvalues of the adjacency matrix of the graph that are closest to a value determined by the parameters of the model. It then applies $k$-means clustering to the embedding. We prove weak consistency and show that a simple local refinement step ensures strong consistency. A key ingredient is an application of a non-standard version of Davis-Kahan theorem to control eigenspace perturbations when eigenvalues are not simple. We also analyze the limiting spectrum of the adjacency matrix, using a combination of combinatorial and matrix techniques.

7.3AIJun 4, 2024

Tabular and Deep Learning for the Whittle Index

Francisco Robledo Relaño, Vivek Borkar, Urtzi Ayesta et al.

The Whittle index policy is a heuristic that has shown remarkably good performance (with guaranteed asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABPs). In this paper we present QWI and QWINN, two reinforcement learning algorithms, respectively tabular and deep, to learn the Whittle index for the total discounted criterion. The key feature is the use of two time-scales, a faster one to update the state-action Q -values, and a relatively slower one to update the Whittle indices. In our main theoretical result we show that QWI, which is a tabular implementation, converges to the real Whittle indices. We then present QWINN, an adaptation of QWI algorithm using neural networks to compute the Q -values on the faster time-scale, which is able to extrapolate information from one state to another and scales naturally to large state-space environments. For QWINN, we show that all local minima of the Bellman error are locally stable equilibria, which is the first result of its kind for DQN-based schemes. Numerical computations show that QWI and QWINN converge faster than the standard Q -learning algorithm, neural-network based approximate Q-learning and other state of the art algorithms.

2.6LGJun 3, 2024

Deep reinforcement learning for weakly coupled MDP's with continuous actions

Francisco Robledo, Urtzi Ayesta, Konstantin Avrachenkov

This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the challenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolution for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA's robustness and efficiency in managing resource allocation while maximizing rewards.

8.4LGMar 10, 2021

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

K. E. Avrachenkov, V. S. Borkar, H. P. Dolhare et al.

We analyze the DQN reinforcement learning algorithm as a stochastic approximation scheme using the o.d.e. (for 'ordinary differential equation') approach and point out certain theoretical issues. We then propose a modified scheme called Full Gradient DQN (FG-DQN, for short) that has a sound theoretical basis and compare it with the original scheme on sample problems. We observe a better performance for FG-DQN.

9.0IRSep 17, 2020

Online Algorithms for Estimating Change Rates of Web Pages

Konstantin Avrachenkov, Kishor Patil, Gugan Thoppe

A search engine maintains local copies of different web pages to provide quick search results. This local cache is kept up-to-date by a web crawler that frequently visits these different pages to track changes in them. Ideally, the local copy should be updated as soon as a page changes on the web. However, finite bandwidth availability and server restrictions limit how frequently different pages can be crawled. This brings forth the following optimization problem: maximize the freshness of the local cache subject to the crawling frequencies being within prescribed bounds. While tractable algorithms do exist to solve this problem, these either assume the knowledge of exact page change rates or use inefficient methods such as MLE for estimating the same. We address this issue here. We provide three novel schemes for online estimation of page change rates, all of which have extremely low running times per iteration. The first is based on the law of large numbers and the second on stochastic approximation. The third is an extension of the second and includes a heavy-ball momentum term. All these schemes only need partial information about the page change process, i.e., they only need to know if the page has changed or not since the last crawled instance. Our main theoretical results concern asymptotic convergence and convergence rates of these three schemes. In fact, our work is the first to show convergence of the original stochastic heavy-ball method when neither the gradient nor the noise variance is uniformly bounded. We also provide some numerical experiments (based on real and synthetic data) to demonstrate the superiority of our proposed estimators over existing ones such as MLE. We emphasize that our algorithms are also readily applicable to the synchronization of databases and network inventory management.

7.2LGSep 4, 2020

LFGCN: Levitating over Graphs with Levy Flights

Yuzhou Chen, Yulia R. Gel, Konstantin Avrachenkov

Due to high utility in many applications, from social networks to blockchain to power grids, deep learning on non-Euclidean objects such as graphs and manifolds, coined Geometric Deep Learning (GDL), continues to gain an ever increasing interest. We propose a new Lévy Flights Graph Convolutional Networks (LFGCN) method for semi-supervised learning, which casts the Lévy Flights into random walks on graphs and, as a result, allows both to accurately account for the intrinsic graph topology and to substantially improve classification performance, especially for heterogeneous graphs. Furthermore, we propose a new preferential P-DropEdge method based on the Girvan-Newman argument. That is, in contrast to uniform removing of edges as in DropEdge, following the Girvan-Newman algorithm, we detect network periphery structures using information on edge betweenness and then remove edges according to their betweenness centrality. Our experimental results on semi-supervised node classification tasks demonstrate that the LFGCN coupled with P-DropEdge accelerates the training task, increases stability and further improves predictive accuracy of learned graph topology structure. Finally, in our case studies we bring the machinery of LFGCN and other deep networks tools to analysis of power grid networks - the area where the utility of GDL remains untapped.

4.3STAug 11, 2020Code

Community recovery in non-binary and temporal stochastic block models

Konstantin Avrachenkov, Maximilien Dreveton, Lasse Leskelä

This article studies the estimation of latent community memberships from pairwise interactions in a network of $N$ nodes, where the observed interactions can be of arbitrary type, including binary, categorical, and vector-valued, and not excluding even more general objects such as time series or spatial point patterns. As a generative model for such data, we introduce a stochastic block model with a general measurable interaction space $\mathcal S$, for which we derive information-theoretic bounds for the minimum achievable error rate. These bounds yield sharp criteria for the existence of consistent and strongly consistent estimators in terms of data sparsity, statistical similarity between intra- and inter-block interaction distributions, and the shape and size of the interaction space. The general framework makes it possible to study temporal and multiplex networks with $\mathcal S = \{0,1\}^T$, in settings where both $N \to \infty$ and $T \to \infty$, and the temporal interaction patterns are correlated over time. For temporal Markov interactions, we derive sharp consistency thresholds. We also present fast online estimation algorithms which fully utilise the non-binary nature of the observed data. Numerical experiments on synthetic and real data show that these algorithms rapidly produce accurate estimates even for very sparse data arrays.

5.8LGJul 29, 2020Code

Almost exact recovery in noisy semi-supervised learning

Konstantin Avrachenkov, Maximilien Dreveton

Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data. In this work, we study the effect of a noisy oracle on classification. In particular, we derive the Maximum A Posteriori (MAP) estimator for clustering a Degree Corrected Stochastic Block Model (DC-SBM) when a noisy oracle reveals a fraction of the labels. We then propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency. Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.

4.7OCJul 8, 2020

Dynamic social learning under graph constraints

Konstantin Avrachenkov, Vivek S. Borkar, Sharayu Moharir et al.

We introduce a model of graph-constrained dynamic choice with reinforcement modeled by positively $α$-homogeneous rewards. We show that its empirical process, which can be written as a stochastic approximation recursion with Markov noise, has the same probability law as a certain vertex reinforced random walk. We use this equivalence to show that for $α> 0$, the asymptotic outcome concentrates around the optimum in a certain limiting sense when `annealed' by letting $α\uparrow\infty$ slowly.

19.1LGApr 29, 2020

Whittle index based Q-learning for restless bandits with average reward

Konstantin E. Avrachenkov, Vivek S. Borkar

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

4.3IRApr 5, 2020

Change Rate Estimation and Optimal Freshness in Web Page Crawling

Konstantin Avrachenkov, Kishor Patil, Gugan Thoppe

For providing quick and accurate results, a search engine maintains a local snapshot of the entire web. And, to keep this local cache fresh, it employs a crawler for tracking changes across various web pages. However, finite bandwidth availability and server restrictions impose some constraints on the crawling frequency. Consequently, the ideal crawling rates are the ones that maximise the freshness of the local cache and also respect the above constraints. Azar et al. 2018 recently proposed a tractable algorithm to solve this optimisation problem. However, they assume the knowledge of the exact page change rates, which is unrealistic in practice. We address this issue here. Specifically, we provide two novel schemes for online estimation of page change rates. Both schemes only need partial information about the page change process, i.e., they only need to know if the page has changed or not since the last crawled instance. For both these schemes, we prove convergence and, also, derive their convergence rates. Finally, we provide some numerical experiments to compare the performance of our proposed estimators with the existing ones (e.g., MLE).

0.8LGOct 7, 2018Code

Graphlet Count Estimation via Convolutional Neural Networks

Xutong Liu, Yu-Zhen Janice Chen, John C. S. Lui et al.

Graphlets are defined as k-node connected induced subgraph patterns. For an undirected graph, 3-node graphlets include close triangle and open triangle. When k = 4, there are six types of graphlets, e.g., tailed-triangle and clique are two possible 4-node graphlets. The number of each graphlet, called graphlet count, is a signature which characterizes the local network structure of a given graph. Graphlet count plays a prominent role in network analysis of many fields, most notably bioinformatics and social science. However, computing exact graphlet count is inherently difficult and computational expensive because the number of graphlets grows exponentially large as the graph size and/or graphlet size k grow. To deal with this difficulty, many sampling methods were proposed to estimate graphlet count with bounded error. Nevertheless, these methods require large number of samples to be statistically reliable, which is still computationally demanding. Moreover, they have to repeat laborious counting procedure even if a new graph is similar or exactly the same as previous studied graphs. Intuitively, learning from historic graphs can make estimation more accurate and avoid many repetitive counting to reduce computational cost. Based on this idea, we propose a convolutional neural network (CNN) framework and two preprocessing techniques to estimate graphlet count. Extensive experiments on two types of random graphs and real world biochemistry graphs show that our framework can offer substantial speedup on estimating graphlet count of new graphs with high accuracy.

7.3SIJun 20, 2018

Mean Field Analysis of Personalized PageRank with Implications for Local Graph Clustering

Konstantin Avrachenkov, Arun Kadavankandy, Nelly Litvak

We analyse a mean-field model of Personalized PageRank on the Erdos-Renyi random graph containing a denser planted Erdos-Renyi subgraph. We investigate the regimes where the values of Personalized PageRank concentrate around the mean-field value. We also study the optimization of the damping factor, the only parameter in Personalized PageRank. Our theoretical results help to understand the applicability of Personalized PageRank and its limitations for local graph clustering.

4.3LGNov 10, 2016

The Power of Side-information in Subgraph Detection

Arun Kadavankandy, Konstantin Avrachenkov, Laura Cottatellucci et al.

In this work, we tackle the problem of hidden community detection. We consider Belief Propagation (BP) applied to the problem of detecting a hidden Erdős-Rényi (ER) graph embedded in a larger and sparser ER graph, in the presence of side-information. We derive two related algorithms based on BP to perform subgraph detection in the presence of two kinds of side-information. The first variant of side-information consists of a set of nodes, called cues, known to be from the subgraph. The second variant of side-information consists of a set of nodes that are cues with a given probability. It was shown in past works that BP without side-information fails to detect the subgraph correctly when an effective signal-to-noise ratio (SNR) parameter falls below a threshold. In contrast, in the presence of non-trivial side-information, we show that the BP algorithm achieves asymptotically zero error for any value of the SNR parameter. We validate our results through simulations on synthetic datasets as well as on a few real world networks.

1.2SIOct 19, 2015

Bayesian Inference of Online Social Network Statistics via Lightweight Random Walk Crawls

Konstantin Avrachenkov, Bruno Ribeiro, Jithin K. Sreedharan

Online social networks (OSN) contain extensive amount of information about the underlying society that is yet to be explored. One of the most feasible technique to fetch information from OSN, crawling through Application Programming Interface (API) requests, poses serious concerns over the the guarantees of the estimates. In this work, we focus on making reliable statistical inference with limited API crawls. Based on regenerative properties of the random walks, we propose an unbiased estimator for the aggregated sum of functions over edges and proved the connection between variance of the estimator and spectral gap. In order to facilitate Bayesian inference on the true value of the estimator, we derive the approximate posterior distribution of the estimate. Later the proposed ideas are validated with numerical experiments on inference problems in real-world networks.

2.1LGSep 4, 2015

Parallel and Distributed Approaches for Graph Based Semi-supervised Learning

Konstantin Avrachenkov, Vivek Borkar, Krishnakant Saboo

Two approaches for graph based semi-supervised learning are proposed. The firstapproach is based on iteration of an affine map. A key element of the affine map iteration is sparsematrix-vector multiplication, which has several very efficient parallel implementations. The secondapproach belongs to the class of Markov Chain Monte Carlo (MCMC) algorithms. It is based onsampling of nodes by performing a random walk on the graph. The latter approach is distributedby its nature and can be easily implemented on several processors or over the network. Boththeoretical and practical evaluations are provided. It is found that the nodes are classified intotheir class with very small error. The sampling algorithm's ability to track new incoming nodesand to classify them is also demonstrated.

3.9LGAug 20, 2015

Semi-supervised Learning with Regularized Laplacian

Konstantin Avrachenkov, Pavel Chebotarev, Alexey Mishenin

We study a semi-supervised learning method based on the similarity graph and RegularizedLaplacian. We give convenient optimization formulation of the Regularized Laplacian method and establishits various properties. In particular, we show that the kernel of the methodcan be interpreted in terms of discrete and continuous time random walks and possesses several importantproperties of proximity measures. Both optimization and linear algebra methods can be used for efficientcomputation of the classification functions. We demonstrate on numerical examples that theRegularized Laplacian method is competitive with respect to the other state of the art semi-supervisedlearning methods.

17.1IRMar 30, 2015

Whittle Index Policy for Crawling Ephemeral Content

Konstantin Avrachenkov, Vivek Borkar

We consider a task of scheduling a crawler to retrieve content from several sites with ephemeral content. A user typically loses interest in ephemeral content, like news or posts at social network groups, after several days or hours. Thus, development of timely crawling policy for such ephemeral information sources is very important. We first formulate this problem as an optimal control problem with average reward. The reward can be measured in the number of clicks or relevant search requests. The problem in its initial formulation suffers from the curse of dimensionality and quickly becomes intractable even with moderate number of information sources. Fortunately, this problem admits a Whittle index, which leads to problem decomposition and to a very simple and efficient crawling policy. We derive the Whittle index and provide its theoretical justification.

3.3IRAug 4, 2014

Personalized PageRank with Node-dependent Restart

Konstantin Avrachenkov, Remco W. Van Der Hofstad, Marina Sokol

Personalized PageRank is an algorithm to classify the improtance of web pages on a user-dependent basis. We introduce two generalizations of Personalized PageRank with node-dependent restart. The first generalization is based on the proportion of visits to nodes before the restart, whereas the second generalization is based on the probability of visited node just before the restart. In the original case of constant restart probability, the two measures coincide. We discuss interesting particular cases of restart probabilities and restart distributions. We show that the both generalizations of Personalized PageRank have an elegant expression connecting the so-called direct and reverse Personalized PageRanks that yield a symmetry property of these Personalized PageRanks.