Frank W. Takes

SI
h-index8
6papers
22citations
Novelty45%
AI Score47

6 Papers

SIMay 1
Link Fraction Mixed Membership Reveals Community Diversity in Aggregated Social Networks

Gamal Adel, Eszter Bokányi, Eelke M. Heemskerk et al.

Community detection is a critical tool for understanding the mesoscopic structure of large-scale networks. However, when applied to aggregated or coarse-grained social networks, disjoint community partitions cannot capture the diverse composition of community memberships within aggregated nodes. While existing mixed membership methods alleviate this issue, they may detect communities that are highly sensitive to the aggregation resolution, not reliably reflecting the community structure of the underlying individual-level network. This paper presents the Link Fraction Mixed Membership (LFMM) method, which computes the mixed memberships of nodes in aggregated networks. Unlike existing mixed membership methods, LFMM is consistent under aggregation. Specifically, we show that it conserves community membership sums at different scales. The method is utilized to study a population-scale social network of the Netherlands, aggregated at different resolutions. Experiments reveal variation in community membership across different geographical regions and evolution over the last decade. In particular, we show how our method identifies large urban hubs that act as the melting pots of diverse, spatially remote communities.

SIApr 13
The anonymization problem in social networks

Rachel G. de Jong, Mark P. J. van der Loo, Frank W. Takes

This paper introduces a unified computational framework for the anonymization problem in social networks, where the objective is to maximize node anonymity through graph alterations. We define three variants of the underlying optimization problem: full, partial and budgeted anonymization. In each variant, the objective is to maximize the number of $k$-anonymous nodes, i.e., nodes for which at least $k-1$ other nodes are equivalent under a particular anonymity measure. We propose four new heuristic network anonymization algorithms and implement these in ANO-NET, a reusable computational framework. Experiments on three common graph models and 19 real-world network datasets yield three empirical findings. First, regarding the method of alteration, experiments on graph models show that random edge deletion is more effective than edge rewiring and addition. Second, we show that the choice of anonymity measure strongly affects both initial network anonymity and the difficulty of anonymization. This highlights the importance of careful measure selection, matching a realistic attacker scenario. Third, comparing the four proposed algorithms and an edge sampling baseline from the literature, we find that an approach which preferentially deletes edges affecting structurally unique nodes, consistently outperforms heuristics based solely on network structure. Overall, our best performing algorithm retains on average 14 times more edges in full anonymization. Moreover, it yields 4.8 times more anonymous nodes than the baseline in the budgeted variant. On top of that, the best performing algorithm achieves a better trade-off between anonymity and data utility. This work provides a foundation for the future development of effective network anonymization algorithms.

SIMay 12
Fuzzy k-anonymity in complex networks

Rachel G. de Jong, Mark P. J. van der Loo, Frank W. Takes

With the introduction of large-scale network data, including population-scale social networks, techniques for privacy-aware sharing of network data become increasingly important. While existing $k$-anonymity approaches can model different attacker scenarios, they typically assume that attacker knowledge exactly matches the published network structure. We argue that exact knowledge is often unrealistic and introduce $ϕ$-$k$-anonymity, a fuzzy variant of $k$-anonymity in which parameter $ϕ$ captures the level of uncertainty in attacker knowledge. Across a benchmark of $39$ real-world networks, a realistic level of uncertainty ($ϕ=5\%$) renders, on average, $64\%$ of previously unique nodes anonymous. To further enhance anonymity, we apply anonymization algorithms under a $5\%$ edge modification budget. While full anonymization is often unattainable under exact $k$-anonymity, with low uncertainty ($ϕ=10\%$) our newly proposed Greedy algorithm anonymizes over $99\%$ of the nodes. Uncertainty also enables effective anonymization in otherwise difficult to anonymize dense synthetic graphs. Additionally, data utility in terms of structural properties and performance on network analysis tasks is well preserved, with most metrics changing less than $5\%$. Overall, our findings suggest that modest uncertainty assumptions yield high levels of anonymity and utility, motivating further research on uncertainty-aware privacy guarantees for network data.

CVJul 9, 2025
Ecological Legacies of Pre-Columbian Settlements Evident in Palm Clusters of Neotropical Mountain Forests

Sebastian Fajardo, Sina Mohammadi, Jonas Gregorio de Souza et al.

Ancient populations markedly transformed Neotropical forests, yet the spatial extent of their ecological influence remains underexplored at high resolution. Here we present a deep learning and remote sensing based approach to estimate areas of pre-Columbian forest modification based on modern vegetation. We apply this method to high-resolution satellite imagery from the Sierra Nevada de Santa Marta, Colombia, as a demonstration of a scalable approach, to evaluate palm tree distributions in relation to archaeological infrastructure. Palms were significantly more abundant near archaeological sites with large infrastructure investment. The extent of the largest palm cluster indicates that ancient human-managed areas linked to major infrastructure sites may be up to two orders of magnitude bigger than indicated by current archaeological evidence alone. Our findings suggest that pre-Columbian populations influenced vegetation, fostering conditions conducive to palm proliferation, leaving a lasting ecological footprint. This may have lowered the logistical costs of establishing infrastructure-heavy settlements in less accessible locations.

LGOct 18, 2021
Fair Tree Classifier using Strong Demographic Parity

António Pereira Barata, Frank W. Takes, H. Jaap van den Herik et al.

When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards any sensitive attribute, like gender or race, induced from biased data. A few hybrid tree optimisation criteria exist that combine classification performance and fairness. Although the threshold-free ROC-AUC is the standard for measuring traditional classification model performance, current fair tree classification methods mainly optimise for a fixed threshold on both the classification task as well as the fairness metric. In this paper, we propose a compound splitting criterion which combines threshold-free (i.e., strong) demographic parity with ROC-AUC termed SCAFF -- Splitting Criterion AUC for Fairness -- and easily extends to bagged and boosted tree frameworks. Our method simultaneously leverages multiple sensitive attributes of which the values may be multicategorical or intersectional, and is tunable with respect to the unavoidable performance-fairness trade-off. In our experiments, we demonstrate how SCAFF generates models with performance and fairness with respect to binary, multicategorical, and multiple sensitive attributes.

DMJul 16, 2018
Computing Minimum Weight Cycles to Leverage Mispricings in Cryptocurrency Market Networks

Francesco Bortolussi, Zeger Hoogeboom, Frank W. Takes

Cryptocurrencies such as Bitcoin and Ethereum have recently gained a lot of popularity, not only as a digital form of currency but also as an investment vehicle. Online marketplaces and exchanges allow users across the world to convert between dozens of different cryptocurrencies and regular currencies such as euros or dollars. Due to the novelty of this concept, the volatility of these markets and the differences in maturity and usage of particular marketplaces, currency pairs may appear at multiple marketplaces but at different trading prices. This paper proposes a novel algorithmic approach to take advantage of these mispricings and capitalize upon the pricing differences that exist between exchanges and currency pairs. To do so, we model each combination of a currency and a market as one node in a graph. A directed link between two nodes indicates that a conversion between these two currency/market pairs is possible. The weight of the link relates to the exchange rate of executing this particular currency exchange. To leverage the mispricings, we seek for cycles in the graph such that upon multiplying the weights of the links in the cycle, a value greater than 1 is found and thus a profit can be made. Our goal is to do this efficiently, without exhaustively enumerating all possible cycles in the graph. Therefore, we convert our data and address the problem in terms of finding minimum weight triangles in graphs with integer weights, for which efficient algorithms can be utilized. We experiment with parameter settings (heuristics) related to the conversion of exchange rate data into integer weight values. We show that our approach improves upon a reasonable baseline algorithm in terms of computation time. Furthermore, using a real-world dataset, we demonstrate how the obtained minimal weight cycles indeed unveil a number of currency exchange cycles that result in a net profit.