24.2CEApr 13Code
Divergence-Guided Particle Swarm OptimizationKleyton da Costa, Bernardo Modenesi, Ivan F. M. Menezes et al.
Particle Swarm Optimization (PSO) is susceptible to premature convergence when the swarm collapses around the global best, particularly on multimodal landscapes in higher dimensions. We propose Divergence-guided PSO (DPSO), which augments the velocity update with a modulation term that repels particles whose personal bests have converged near the global best. The repulsion is gated by a Gaussian similarity kernel, which we prove is equivalent to an exponentially decaying function of the KL divergence between Gaussian-embedded personal and global bests, connecting the mechanism to the family of $f$-divergences and providing a principled basis for kernel design. Experiments on 36 benchmark functions (15 unimodal, 21 multimodal) across dimensions $D \in \{10, 30, 50\}$, each with 30 independent runs, show that DPSO frequently outperforms standard PSO on multimodal problems, with improvements of 2-8$\times$ on functions such as Pinter, Ackley, and Levy, and up to 5$\times$ reduction in run-to-run variance. On unimodal landscapes the modulation term is counterproductive, confirming that DPSO targets the exploration-exploitation trade-off rather than offering a universal improvement. The method adds one hyperparameter, incurs 15--25\% wall-clock overhead, and does not increase the asymptotic per-iteration complexity of PSO. The project code is available here: https://github.com/Kleyt0n/dpso
CLNov 3, 2023
An Interdisciplinary Outlook on Large Language Models for Scientific ResearchJames Boyko, Joseph Cohen, Nathan Fox et al.
In this paper, we describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision. We examine how LLMs augment scientific inquiry, offering concrete examples such as accelerating literature review by summarizing vast numbers of publications, enhancing code development through automated syntax correction, and refining the scientific writing process. Simultaneously, we articulate the challenges LLMs face, including their reliance on extensive and sometimes biased datasets, and the potential ethical dilemmas stemming from their use. Our critical discussion extends to the varying impacts of LLMs across fields, from the natural sciences, where they help model complex biological sequences, to the social sciences, where they can parse large-scale qualitative data. We conclude by offering a nuanced perspective on how LLMs can be both a boon and a boundary to scientific progress.
LGFeb 23, 2023
Evaluating Explainability in Machine Learning Predictions through Explainer-Agnostic MetricsCristian Munoz, Kleyton da Costa, Bernardo Modenesi et al.
The rapid integration of artificial intelligence (AI) into various industries has introduced new challenges in governance and regulation, particularly regarding the understanding of complex AI systems. A critical demand from decision-makers is the ability to explain the results of machine learning models, which is essential for fostering trust and ensuring ethical AI practices. In this paper, we develop six distinct model-agnostic metrics designed to quantify the extent to which model predictions can be explained. These metrics measure different aspects of model explainability, ranging from local importance, global importance, and surrogate predictions, allowing for a comprehensive evaluation of how models generate their outputs. Furthermore, by computing our metrics, we can rank models in terms of explainability criteria such as importance concentration and consistency, prediction fluctuation, and surrogate fidelity and stability, offering a valuable tool for selecting models based not only on accuracy but also on transparency. We demonstrate the practical utility of these metrics on classification and regression tasks, and integrate these metrics into an existing Python package for public use.
18.1CEMay 9
GraphNetz: Statistical Benchmarking of Graph Neural Networks with Paired Tests and Rank AggregationKleyton da Costa, Bernardo Modenesi
Graph Neural Networks (GNNs) benchmarks often report single point estimates, even when performance differences are small relative to variation across random seeds, train/test splits, and datasets. Confidence intervals, paired comparisons, multiple-comparison correction, and rank-based aggregation are standard statistical tools, but they are rarely the default output of graph-learning benchmark suites. We introduce GraphNetz, a benchmarking framework whose default output is a structured statistical report rather than a raw accuracy table. GraphNetz currently includes 63 dataset loaders, four task types, and five canonical GNN architectures, while also supporting custom datasets and models. The framework standardizes multi-seed evaluation and automatically returns per-cell confidence intervals, Holm-corrected paired tests, and Friedman-Nemenyi critical-difference diagrams across tasks. In a cross-category benchmark over ten heterogeneous tasks, apparent rank differences among four canonical node-level encoders fall within a single Nemenyi clique, indicating that none is significantly better than the others at $α= 0.05$. GraphNetz therefore provides researchers with a reproducible computational and statistical pipeline to benchmark new graph-learning methods against standard architectures, over different tasks and a wide set of applications, while reporting principled statistical evidence for benchmarking which accounts for seed uncertainty. This framework is set to serve the graph-learning community with a reproducible and honest model comparison ready to be added to papers.
LGSep 6, 2025
Machine Generalize Learning in Agent-Based Models: Going Beyond Surrogate Models for Calibration in ABMsSima Najafzadehkhoei, George Vega Yon, Bernardo Modenesi et al.
Calibrating agent-based epidemic models is computationally demanding. We present a supervised machine learning calibrator that learns the inverse mapping from epidemic time series to SIR parameters. A three-layer bidirectional LSTM ingests 60-day incidence together with population size and recovery rate, and outputs transmission probability, contact rate, and R0. Training uses a composite loss with an epidemiology-motivated consistency penalty that encourages R0 \* recovery rate to equal transmission probability \* contact rate. In a 1000-scenario simulation study, we compare the calibrator with Approximate Bayesian Computation (likelihood-free MCMC). The method achieves lower error across all targets (MAE: R0 0.0616 vs 0.275; transmission 0.0715 vs 0.128; contact 1.02 vs 4.24), produces tighter predictive intervals with near nominal coverage, and reduces wall clock time from 77.4 s to 2.35 s per calibration. Although contact rate and transmission probability are partially nonidentifiable, the approach reproduces epidemic curves more faithfully than ABC, enabling fast and practical calibration. We evaluate it on SIR agent based epidemics generated with epiworldR and provide an implementation in R.