Nuno Fachada

h-index13

11papers

126citations

Novelty23%

AI Score43

Ranked #52,492 of 194,257 authors (top 27%)#428 in IV (top 10%)

11 Papers

3.8LGJan 24, 2023Code

Generating Multidimensional Clusters With Support Lines

Nuno Fachada, Diogo de Andrade

Synthetic data is essential for assessing clustering techniques, complementing and extending real data, and allowing for more complete coverage of a given problem's space. In turn, synthetic data generators have the potential of creating vast amounts of data -- a crucial activity when real-world data is at premium -- while providing a well-understood generation procedure and an interpretable instrument for methodically investigating cluster analysis algorithms. Here, we present Clugen, a modular procedure for synthetic data generation, capable of creating multidimensional clusters supported by line segments using arbitrary distributions. Clugen is open source, comprehensively unit tested and documented, and is available for the Python, R, Julia, and MATLAB/Octave ecosystems. We demonstrate that our proposal can produce rich and varied results in various dimensions, is fit for use in the assessment of clustering algorithms, and has the potential to be a widely used framework in diverse clustering-related research tasks.

1.9CLDec 22, 2022Code

MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification

Alina Petukhova, Nuno Fachada

This article presents a dataset of 10,917 news articles with hierarchical news categories collected between 1 January 2019 and 31 December 2019. We manually labeled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news.

7.9HCJun 3

Closing the Loop in Affect-Driven Game Adaptation: A Systematic Review

Phil Lopes, Nuno Fachada, Maria Fonseca

Recognizing player state is only one component of affective game adaptation; inferred experience must also be translated into adaptive interventions that modify gameplay or game content. Although player experience modeling and content adaptation are established research areas, fewer studies examine how sensing, modeling, and adaptation are integrated into complete, empirically evaluated gameplay systems. This PRISMA-guided systematic review analyzes 23 empirical studies published from January 1, 2015, to December 31, 2025, that implement a complete experience-driven loop defined here as the combination of player data acquisition, player experience modeling, and adaptive game content. Complete-loop systems were relatively uncommon in the retrieved corpus, and the selected systems were predominantly oriented toward dynamic difficulty adjustment, engagement, rehabilitation, or performance-related goals. Game telemetry was the dominant input modality, while non-invasive sources with affective relevance, such as facial expression analysis and peripheral interaction data, were less common. Knowledge-based methods, including rule-based systems and heuristics, dominated both modeling and adaptation because of their interpretability and low deployment requirements, whereas machine learning approaches were less frequent and remained constrained by data availability, transparency, and runtime integration challenges. Most importantly, affective information was often used to support challenge calibration or related adaptation objectives, while stress, anxiety, horror, and related affective states were rarely addressed as explicit adaptation targets. These findings identify a gap within this review scope: affective information may enter an adaptive loop without making affective state the objective of adaptation.

8.8SEApr 30

Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study

Nuno Fachada, Daniel Fernandes, Carlos M. Fernandes et al.

Large language models (LLMs) can now synthesize non-trivial executable code from textual descriptions, raising an important question: can LLMs reliably implement agent-based models from standardized specifications in a way that supports replication, verification, and validation? We address this question by evaluating 17 contemporary LLMs on a controlled ODD-to-code translation task, using the PPHPC predator-prey model as a fully specified reference. Generated Python implementations are assessed through staged executability checks, model-independent statistical comparison against a validated NetLogo baseline, and quantitative measures of runtime efficiency and maintainability. Results show that behaviorally faithful implementations are achievable but not guaranteed, and that executability alone is insufficient for scientific use. GPT-4.1 consistently produces statistically valid and efficient implementations, with Claude 3.7 Sonnet performing well but less reliably. Overall, the findings clarify both the promise and current limitations of LLMs as model engineering tools, with implications for reproducible agent-based and ecological modeling.

5.3IVSep 4, 2023

Multispectral Indices for Wildfire Management

Afonso Oliveira, João P. Matos-Carvalho, Filipe Moutinho et al.

The increasing frequency and severity of wildfires requires advanced methods for effective surveillance and management. Traditional ground-based observation techniques often struggle to adapt to rapidly changing fire behavior and environmental conditions. This paper examines the application of multispectral aerial and satellite imagery in wildfire management, emphasizing the identification and analysis of key factors influencing wildfire behavior, such as combustible vegetation and water features. Through a comprehensive review of current literature and the presentation of two practical case studies, we assess various multispectral indices and evaluate their effectiveness in extracting critical environmental attributes essential for wildfire prevention and management. Our case studies highlight several indices as particularly effective for segmentation and extraction: NVDI for vegetation, MNDWI for water features, and MSR for artificial structures. These indices significantly enhance wildfire data processing, thereby supporting improved monitoring and response strategies.

15.4CLMar 22, 2024

Text Clustering with Large Language Model Embeddings

Alina Petukhova, João P. Matos-Carvalho, Nuno Fachada

Text clustering is an important method for organising the increasing volume of digital content, aiding in the structuring and discovery of hidden patterns in uncategorised data. The effectiveness of text clustering largely depends on the selection of textual embeddings and clustering algorithms. This study argues that recent advancements in large language models (LLMs) have the potential to enhance this task. The research investigates how different textual embeddings, particularly those utilised in LLMs, and various clustering algorithms influence the clustering of text datasets. A series of experiments were conducted to evaluate the impact of embeddings on clustering results, the role of dimensionality reduction through summarisation, and the adjustment of model size. The findings indicate that LLM embeddings are superior at capturing subtleties in structured language. OpenAI's GPT-3.5 Turbo model yields better results in three out of five clustering metrics across most tested datasets. Most LLM embeddings show improvements in cluster purity and provide a more informative silhouette score, reflecting a refined structural understanding of text data compared to traditional methods. Among the more lightweight models, BERT demonstrates leading performance. Additionally, it was observed that increasing model dimensionality and employing summarisation techniques do not consistently enhance clustering efficiency, suggesting that these strategies require careful consideration for practical application. These results highlight a complex balance between the need for refined text representation and computational feasibility in text clustering applications. This study extends traditional text clustering frameworks by integrating embeddings from LLMs, offering improved methodologies and suggesting new avenues for future research in various types of textual analysis.

15.9SEJul 30, 2025

GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries

Nuno Fachada, Daniel Fernandes, Carlos M. Fernandes et al.

Large Language Models (LLMs) have advanced rapidly as tools for automating code generation in scientific research, yet their ability to interpret and use unfamiliar Python APIs for complex computational experiments remains poorly characterized. This study systematically benchmarks a selection of state-of-the-art LLMs in generating functional Python code for two increasingly challenging scenarios: conversational data analysis with the \textit{ParShift} library, and synthetic data generation and clustering using \textit{pyclugen} and \textit{scikit-learn}. Both experiments use structured, zero-shot prompts specifying detailed requirements but omitting in-context examples. Model outputs are evaluated quantitatively for functional correctness and prompt compliance over multiple runs, and qualitatively by analyzing the errors produced when code execution fails. Results show that only a small subset of models consistently generate correct, executable code. GPT-4.1 achieved a 100\% success rate across all runs in both experimental tasks, whereas most other models succeeded in fewer than half of the runs, with only Grok-3 and Mistral-Large approaching comparable performance. In addition to benchmarking LLM performance, this approach helps identify shortcomings in third-party libraries, such as unclear documentation or obscure implementation bugs. Overall, these findings highlight current limitations of LLMs for end-to-end scientific automation and emphasize the need for careful prompt design, comprehensive library documentation, and continued advances in language model capabilities.

3.6IVApr 9, 2024Code

Raster Forge: Interactive Raster Manipulation Library and GUI for Python

Afonso Oliveira, Nuno Fachada, João P. Matos-Carvalho

Raster Forge is a Python library and graphical user interface for raster data manipulation and analysis. The tool is focused on remote sensing applications, particularly in wildfire management. It allows users to import, visualize, and process raster layers for tasks such as image compositing or topographical analysis. For wildfire management, it generates fuel maps using predefined models. Its impact extends from disaster management to hydrological modeling, agriculture, and environmental monitoring. Raster Forge can be a valuable asset for geoscientists and researchers who rely on raster data analysis, enhancing geospatial data processing and visualization across various disciplines.

3.6IVApr 4, 2024

Data Science for Geographic Information Systems

Afonso Oliveira, Nuno Fachada, João P. Matos-Carvalho

The integration of data science into Geographic Information Systems (GIS) has facilitated the evolution of these tools into complete spatial analysis platforms. The adoption of machine learning and big data techniques has equipped these platforms with the capacity to handle larger amounts of increasingly complex data, transcending the limitations of more traditional approaches. This work traces the historical and technical evolution of data science and GIS as fields of study, highlighting the critical points of convergence between domains, and underlining the many sectors that rely on this integration. A GIS application is presented as a case study in the disaster management sector where we utilize aerial data from Tróia, Portugal, to emphasize the process of insight extraction from raw data. We conclude by outlining prospects for future research in integration of these fields in general, and the developed application in particular.

1.2CGJun 1, 2024

Generating 3D Terrain with 2D Cellular Automata

Nuno Fachada, António R. Rodrigues, Diogo de Andrade et al.

This paper explores the use of 2D cellular automata (CA) to generate 3D terrains through a simple additive approach. Experimenting with multiple CA transition rules produced aesthetically interesting, navigable landscapes, suggesting applicability for terrain generation in games.

2.4AIJul 30, 2021Code

Procedural Generation of 3D Maps with Snappable Meshes

Rafael C. e Silva, Nuno Fachada, Diogo de Andrade et al.

In this paper we present a technique for procedurally generating 3D maps using a set of premade meshes which snap together based on designer-specified visual constraints. The proposed approach avoids size and layout limitations, offering the designer control over the look and feel of the generated maps, as well as immediate feedback on a given map's navigability. A prototype implementation of the method, developed in the Unity game engine, is discussed, and a number of case studies are analyzed. These include a multiplayer game where the method was used, together with a number of illustrative examples which highlight various parameterizations and piece selection methods. The technique can be used as a designer-centric map composition method and/or as a prototyping system in 3D level design, opening the door for quality map and level creation in a fraction of the time of a fully human-based approach.