Samin Aref

SI
h-index17
10papers
94citations
Novelty40%
AI Score48

10 Papers

SISep 10, 2022Code
Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity

Samin Aref, Mahdi Mostajabdaveh, Hriday Chheda

Community detection is a classic network problem with extensive applications in various fields. Its most common method is using modularity maximization heuristics which rarely return an optimal partition or anything similar. Partitions with globally optimal modularity are difficult to compute, and therefore have been underexplored. Using structurally diverse networks, we compare 30 community detection methods including our proposed algorithm that offers optimality and approximation guarantees: the Bayan algorithm. Unlike existing methods, Bayan globally maximizes modularity or approximates it within a factor. Our results show the distinctive accuracy and stability of maximum-modularity partitions in retrieving planted partitions at rates higher than most alternatives for a wide range of parameter settings in two standard benchmarks. Compared to the partitions from 29 other algorithms, maximum-modularity partitions have the best medians for description length, coverage, performance, average conductance, and well clusteredness. These advantages come at the cost of additional computations which Bayan makes possible for small networks (networks that have up to 3000 edges in their largest connected component). Bayan is several times faster than using open-source and commercial solvers for modularity maximization, making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Our results point to a few well performing algorithms, among which Bayan stands out as the most reliable method for small networks. A Python implementation of the Bayan algorithm (bayanpy) is publicly available through the package installer for Python.

SIOct 17, 2023
Analyzing Modularity Maximization in Approximation, Heuristic, and Graph Neural Network Algorithms for Community Detection

Samin Aref, Mahdi Mostajabdaveh

Community detection, which involves partitioning nodes within a network, has widespread applications across computational sciences. Modularity-based algorithms identify communities by attempting to maximize the modularity function across network node partitions. Our study assesses the performance of various modularity-based algorithms in obtaining optimal partitions. Our analysis utilizes 104 networks, including both real-world instances from diverse contexts and modular graphs from two families of synthetic benchmarks. We analyze ten inexact modularity-based algorithms against the exact integer programming baseline that globally optimizes modularity. Our comparative analysis includes eight heuristics, two variants of a graph neural network algorithm, and nine variations of the Bayan approximation algorithm. Our findings reveal that the average modularity-based heuristic yields optimal partitions in only 43.9% of the 104 networks analyzed. Graph neural networks and approximate Bayan, on average, achieve optimality on 68.7% and 82.3% of the networks respectively. Additionally, our analysis of three partition similarity metrics exposes substantial dissimilarities between high-modularity sub-optimal partitions and any optimal partition of the networks. We observe that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of the commonly used modularity-based methods: they rarely produce an optimal partition or a partition resembling an optimal partition even on networks with modular structures. If modularity is to be used for detecting communities, we recommend approximate optimization algorithms for a more methodologically sound usage of modularity within its applicability limits.

SIFeb 28, 2023
Heuristic Modularity Maximization Algorithms for Community Detection Rarely Return an Optimal Partition or Anything Similar

Samin Aref, Mahdi Mostajabdaveh, Hriday Chheda

Community detection is a fundamental problem in computational sciences with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize modularity over different partitions of the network nodes. Using 80 real and random networks from a wide range of contexts, we investigate the extent to which current heuristic modularity maximization algorithms succeed in returning maximum-modularity (optimal) partitions. We evaluate (1) the ratio of the algorithms' output modularity to the maximum modularity for each input graph, and (2) the maximum similarity between their output partition and any optimal partition of that graph. We compare eight existing heuristic algorithms against an exact integer programming method that globally maximizes modularity. The average modularity-based heuristic algorithm returns optimal partitions for only 19.4% of the 80 graphs considered. Additionally, results on adjusted mutual information reveal substantial dissimilarity between the sub-optimal partitions and any optimal partition of the networks in our experiments. More importantly, our results show that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of commonly used modularity-based heuristics for discovering communities: they rarely produce an optimal partition or a partition resembling an optimal partition. If modularity is to be used for detecting communities, exact or approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits.

45.7SOC-PHMay 24
Heuristic and exact modularity optimization with size-constrained communities

Filipi N. Silva, Samin Aref, Vincent Traag et al.

When searching for communities in networks, domain experts may have some prior expectations about the size of communities. Yet, community detection methods normally do not optimize communities under cluster size constraints. Multi-resolution techniques allow users to indirectly control the average community size through changing a resolution parameter, but this practice does not control the size of individual communities. We here study the problem of size-constrained community detection, where the size of all communities is limited to a user-specified range of values, in the context of modularity optimization. We propose a heuristic for modularity optimization under community size constraints. To demonstrate the reliability of our proposed heuristic, we also formulate an exact integer optimization model and use its results as a baseline. Our analysis based on synthetic benchmarks and real networks demonstrate the issues with the currently common practice of changing resolution parameters and reveal the advantages of the proposed methods as a principled way of obtaining size-constrained communities. The proposed method is publicly available in the Python Leiden algorithm package.

LGApr 14, 2025Code
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining

Deyu Cao, Samin Aref

The growing use of large language models has raised environmental and economic concerns about their intensity of resource usage during inference. Serving these models to each user requires substantial energy and water for cooling. Model compression techniques like quantization can shrink large language models and make them more resource efficient at the cost of potential performance degradation. Quantization methods compress model size through replacing their high-precision parameters by quantized values of lower precision. Among existing methods, the ApiQ method achieves superior accuracy preservation at minimal memory and time overhead. We investigate two ideas to extend performance in ultra-low-bit quantization beyond ApiQ's level. First, we look into combining existing quantization-aware training techniques with ApiQ's partial training. We show that this does not outperform the baseline ApiQ method with limited training data and frozen weights. This leads to two key insights: (1) The substantial representational capacity that is gained through full retraining is unlikely to be feasible through partial training. (2) This gain may depend on using a large and diverse dataset in quantization-aware training. Second, through a novel approach informed by the two insights, we propose an ultra-low-bit quantization method that builds upon ApiQ and extends its performance without the need for full retraining. This publicly available method relies on a saliency-aware regularization term that prioritizes preserving the most impactful parameters during quantization. Our experiments on LLaMA 7B and 13B benchmarks demonstrate that our method reduces the ApiQ's accuracy degradation by 10.85% and 7.54% respectively. A Python implementation of the proposed quantization method is publicly available on GitHub https://github.com/TokuyuSou/ULB-SAPR.

AIAug 19, 2025Code
Discrete Optimization of Min-Max Violation and its Applications Across Computational Sciences

Cheikh Ahmed, Mahdi Mostajabdaveh, Samin Aref et al.

We introduce the Discrete Min-Max Violation (DMMV) as a general optimization problem which seeks an assignment of discrete values to variables that minimizes the largest constraint violation. This context-free mathematical formulation is applicable to a wide range of use cases that have worst-case performance requirements. After defining the DMMV problem mathematically, we explore its properties to establish a foundational understanding. To tackle DMMV instance sizes of practical relevance, we develop a GPU-accelerated heuristic that takes advantage of the mathematical properties of DMMV for speeding up the solution process. We demonstrate the versatile applicability of our heuristic by solving three optimization problems as use cases: (1) post-training quantization of language models, (2) discrete tomography, and (3) Finite Impulse Response (FIR) filter design. In quantization without outlier separation, our heuristic achieves 14% improvement on average over existing methods. In discrete tomography, it reduces reconstruction error by 16% under uniform noise and accelerates computations by a factor of 6 on GPU. For FIR filter design, it nearly achieves 50% ripple reduction compared to using the commercial integer optimization solver, Gurobi. Our comparative results point to the benefits of studying DMMV as a context-free optimization problem and the advantages that our proposed heuristic offers on three distinct problems. Our GPU-accelerated heuristic will be made open-source to further stimulate research on DMMV and its other applications. The code is available at https://anonymous.4open.science/r/AMVM-5F3E/

AIAug 16, 2025Code
EvoCut: Strengthening Integer Programs via Evolution-Guided Language Models

Milad Yazdani, Mahdi Mostajabdaveh, Samin Aref et al.

Integer programming lies at the heart of crucial combinatorial optimization tasks but remains challenging due to its NP-hard nature. An effective approach for practically solving integer programs is the manual design of acceleration cuts, i.e. inequalities that improve solver performance. However, this creative process demands deep expertise and is yet to be automated. Our proposed framework, EvoCut, automates the generation of acceleration cuts by combining large language models (LLMs) with an evolutionary search. EvoCut (i) initializes a diverse population of candidate cuts via an LLM-based initializer agent; (ii) for each cut empirically evaluates both preservation of the optimal solution and its ability to cut off fractional solutions across a verification set; and (iii) iteratively refines the population through evolutionary crossover and mutation agents. We quantify each cut's utility by its relative reduction in the solver's optimality gap. Our comparisons against standard integer programming practice show that EvoCut reduces optimality gap by 17-57% within a fixed time. It obtains the same solutions up to 4 times as fast, and obtains higher-quality solutions within the same time limit. Requiring no human expert input, EvoCut reliably generates, improves, and empirically verifies cuts that generalize to unseen instances. The code is available at https://github.com/milad1378yz/EvoCut.

SIApr 15, 2025
Robust Markov stability for community detection at a scale learned based on the structure

Samin Aref, Sanchaai Mathiyarasan

Community detection, the unsupervised task of clustering nodes of a graph, finds applications across various fields. The common approaches for community detection involve optimizing an objective function to partition the nodes into communities at a single scale of granularity. However, the single-scale approaches often fall short of producing partitions that are robust and at a suitable scale. The existing algorithm, PyGenStability, returns multiple robust partitions for a network by optimizing the multi-scale Markov stability function. However, in cases where the suitable scale is not known or assumed by the user, there is no principled method to select a single robust partition at a suitable scale from the multiple partitions that PyGenStability produces. Our proposed method combines the Markov stability framework with a pre-trained machine learning model for scale selection to obtain one robust partition at a scale that is learned based on the graph structure. This automatic scale selection involves using a gradient boosting model pre-trained on hand-crafted and embedding-based network features from a labeled dataset of 10k benchmark networks. This model was trained to predicts the scale value that maximizes the similarity of the output partition to the planted partition of the benchmark network. Combining our scale selection algorithm with the PyGenStability algorithm results in PyGenStabilityOne (PO): a hyperparameter-free multi-scale community detection algorithm that returns one robust partition at a suitable scale without the need for any assumptions, input, or tweaking from the user. We compare the performance of PO against 29 algorithms and show that it outperforms 25 other algorithms by statistically meaningful margins. Our results facilitate choosing between community detection algorithms, among which PO stands out as the accurate, robust, and hyperparameter-free method.

DLOct 15, 2021
Return migration of German-affiliated researchers: Analyzing departure and return by gender, cohort, and discipline using Scopus bibliometric data 1996-2020

Xinyi Zhao, Samin Aref, Emilio Zagheni et al.

The international migration of researchers is an important dimension of scientific mobility, and has been the subject of considerable policy debate. However, tracking the migration life courses of researchers is challenging due to data limitations. In this study, we use Scopus bibliometric data on eight million publications from 1.1 million researchers who have published at least once with an affiliation address from Germany in 1996-2020. We construct the partial life histories of published researchers in this period and explore both their out-migration and the subsequent return of a subset of this group: the returnees. Our analyses shed light on the career stages and gender disparities between researchers who remain in Germany, those who emigrate, and those who eventually return. We find that the return migration streams are even more gender imbalanced, which points to the need for additional efforts to encourage female researchers to come back to Germany. We document a slightly declining trend in return migration among more recent cohorts of researchers who left Germany, which, for most disciplines, was associated with a decrease in the German collaborative ties of these researchers. Moreover, we find that the gender disparities for the most gender imbalanced disciplines are unlikely to be mitigated by return migration given the gender compositions of the cohorts of researchers who have left Germany and of those who have returned. This analysis uncovers new dimensions of migration among scholars by investigating the return migration of published researchers, which is critical for the development of science policy.

CRSep 4, 2020
Evaluating the Security and Economic Effects of Moving Target Defense Techniques on the Cloud

Hooman Alavizadeh, Samin Aref, Dong Seong Kim et al.

Moving Target Defense (MTD) is a proactive security mechanism which changes the attack surface aiming to confuse attackers. Cloud computing leverages MTD techniques to enhance cloud security posture against cyber threats. While many MTD techniques have been applied to cloud computing, there has not been a joint evaluation of the effectiveness of MTD techniques with respect to security and economic metrics. In this paper, we first introduce mathematical definitions for the combination of three MTD techniques: \emph{Shuffle}, \emph{Diversity}, and \emph{Redundancy}. Then, we utilize four security metrics including system risk, attack cost, return on attack, and reliability to assess the effectiveness of the combined MTD techniques applied to large-scale cloud models. Secondly, we focus on a specific context based on a cloud model for E-health applications to evaluate the effectiveness of the MTD techniques using security and economic metrics. We introduce (1) a strategy to effectively deploy Shuffle MTD technique using a virtual machine placement technique and (2) two strategies to deploy Diversity MTD technique through operating system diversification. As deploying Diversity incurs cost, we formulate the \emph{Optimal Diversity Assignment Problem (O-DAP)} and solve it as a binary linear programming model to obtain the assignment which maximizes the expected net benefit.