DS DM LG SI COSep 19, 2017

Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

Charalampos E. Tsourakakis, Michael Mitzenmacher, Kasper Green Larsen, Jarosław Błasiok, Ben Lawson, Preetum Nakkiran, Vasileios Nakos

arXiv:1709.07308v38.622 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses predicting positive and negative links in social networks, offering incremental improvements to existing methods with theoretical and practical validation.

The paper tackles the edge sign prediction problem in signed graphs by developing algorithms that recover all signs with high probability using noisy queries, achieving a query complexity of O(n log n / δ^2 + log^2 n / δ^6), which improves on prior work. Empirical results show that using edge-disjoint paths as features enhances classification accuracy, particularly for node pairs without common neighbors.

Social networks involve both positive and negative relationships, which can be captured in signed graphs. The {\em edge sign prediction problem} aims to predict whether an interaction between a pair of nodes will be positive or negative. We provide theoretical results for this problem that motivate natural improvements to recent heuristics. The edge sign prediction problem is related to correlation clustering; a positive relationship means being in the same cluster. We consider the following model for two clusters: we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability $0<q<\frac{1}{2}$. Let $δ=1-2q$ be the bias. We provide an algorithm that recovers all signs correctly with high probability in the presence of noise with $O(\frac{n\log n}{δ^2}+\frac{\log^2 n}{δ^6})$ queries. This is the best known result for this problem for all but tiny $δ$, improving on the recent work of Mazumdar and Saha \cite{mazumdar2017clustering}. We also provide an algorithm that performs $O(\frac{n\log n}{δ^4})$ queries, and uses breadth first search as its main algorithmic primitive. While both the running time and the number of queries for this algorithm are sub-optimal, our result relies on novel theoretical techniques, and naturally suggests the use of edge-disjoint paths as a feature for predicting signs in online social networks. Correspondingly, we experiment with using edge disjoint $s-t$ paths of short length as a feature for predicting the sign of edge $(s,t)$ in real-world signed networks. Empirical findings suggest that the use of such paths improves the classification accuracy, especially for pairs of nodes with no common neighbors.

View on arXiv PDF Code

Similar