Guilherme Seidyo Imai Aldeia

h-index20

5papers

27citations

Novelty51%

AI Score37

Ranked #115,643 of 205,806 authors (top 56%)#25,272 in LG (top 60%)

5 Papers

LGDec 8, 2025

Towards symbolic regression for interpretable clinical decision scores

Guilherme Seidyo Imai Aldeia, Joseph D. Romano, Fabricio Olivetti de Franca et al.

Medical decision-making makes frequent use of algorithms that combine risk equations with rules, providing clear and standardized treatment pathways. Symbolic regression (SR) traditionally limits its search space to continuous function forms and their parameters, making it difficult to model this decision-making. However, due to its ability to derive data-driven, interpretable models, SR holds promise for developing data-driven clinical risk scores. To that end we introduce Brush, an SR algorithm that combines decision-tree-like splitting algorithms with non-linear constant optimization, allowing for seamless integration of rule-based logic into symbolic regression and classification models. Brush achieves Pareto-optimal performance on SRBench, and was applied to recapitulate two widely used clinical scoring systems, achieving high accuracy and interpretable models. Compared to decision trees, random forests, and other SR methods, Brush achieves comparable or superior predictive performance while producing simpler models.

LGApr 8, 2024

Interpretability in Symbolic Regression: a benchmark of Explanatory Methods using the Feynman data set

Guilherme Seidyo Imai Aldeia, Fabricio Olivetti de Franca

In some situations, the interpretability of the machine learning models plays a role as important as the model accuracy. Interpretability comes from the need to trust the prediction model, verify some of its properties, or even enforce them to improve fairness. Many model-agnostic explanatory methods exists to provide explanations for black-box models. In the regression task, the practitioner can use white-boxes or gray-boxes models to achieve more interpretable results, which is the case of symbolic regression. When using an explanatory method, and since interpretability lacks a rigorous definition, there is a need to evaluate and compare the quality and different explainers. This paper proposes a benchmark scheme to evaluate explanatory methods to explain regression models, mainly symbolic regression models. Experiments were performed using 100 physics equations with different interpretable and non-interpretable regression methods and popular explanation methods, evaluating the performance of the explainers performance with several explanation measures. In addition, we further analyzed four benchmarks from the GP community. The results have shown that Symbolic Regression models can be an interesting alternative to white-box and black-box models that is capable of returning accurate models with appropriate explanations. Regarding the explainers, we observed that Partial Effects and SHAP were the most robust explanation models, with Integrated Gradients being unstable only with tree-based models. This benchmark is publicly available for further experiments.

NEApr 8, 2024

Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive Hashing

Guilherme Seidyo Imai Aldeia, Fabricio Olivetti de Franca, William G. La Cava

Symbolic regression (SR) searches for parametric models that accurately fit a dataset, prioritizing simplicity and interpretability. Despite this secondary objective, studies point out that the models are often overly complex due to redundant operations, introns, and bloat that arise during the iterative process, and can hinder the search with repeated exploration of bloated segments. Applying a fast heuristic algebraic simplification may not fully simplify the expression and exact methods can be infeasible depending on size or complexity of the expressions. We propose a novel agnostic simplification and bloat control for SR employing an efficient memoization with locality-sensitive hashing (LHS). The idea is that expressions and their sub-expressions traversed during the iterative simplification process are stored in a dictionary using LHS, enabling efficient retrieval of similar structures. We iterate through the expression, replacing subtrees with others of same hash if they result in a smaller expression. Empirical results shows that applying this simplification during evolution performs equal or better than without simplification in minimization of error, significantly reducing the number of nonlinear functions. This technique can learn simplification rules that work in general or for a specific problem, and improves convergence while reducing model complexity.

LGAug 7, 2025

Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models

Guilherme Seidyo Imai Aldeia, Daniel S. Herman, William G. La Cava

Large language models (LLMs) have demonstrated remarkable capabilities for medical question answering and programming, but their potential for generating interpretable computable phenotypes (CPs) is under-explored. In this work, we investigate whether LLMs can generate accurate and concise CPs for six clinical phenotypes of varying complexity, which could be leveraged to enable scalable clinical decision support to improve care for patients with hypertension. In addition to evaluating zero-short performance, we propose and test a synthesize, execute, debug, instruct strategy that uses LLMs to generate and iteratively refine CPs using data-driven feedback. Our results show that LLMs, coupled with iterative learning, can generate interpretable and reasonably accurate programs that approach the performance of state-of-the-art ML methods while requiring significantly fewer training examples.

LGFeb 11, 2019

Interaction-Transformation Evolutionary Algorithm for Symbolic Regression

Fabricio Olivetti de Franca, Guilherme Seidyo Imai Aldeia

The Interaction-Transformation (IT) is a new representation for Symbolic Regression that restricts the search space into simpler, but expressive, function forms. This representation has the advantage of creating a smoother search space unlike the space generated by Expression Trees, the common representation used in Genetic Programming. This paper introduces an Evolutionary Algorithm capable of evolving a population of IT expressions supported only by the mutation operator. The results show that this representation is capable of finding better approximations to real-world data sets when compared to traditional approaches and a state-of-the-art Genetic Programming algorithm.