Idelfonso B. R. Nogueira

LG
h-index18
13papers
57citations
Novelty49%
AI Score49

13 Papers

OCSep 13, 2022
A Robust Scientific Machine Learning for Optimization: A Novel Robustness Theorem

Luana P. Queiroz, Carine M. Rebello, Erber A. Costa et al.

Scientific machine learning (SciML) is a field of increasing interest in several different application fields. In an optimization context, SciML-based tools have enabled the development of more efficient optimization methods. However, implementing SciML tools for optimization must be rigorously evaluated and performed with caution. This work proposes the deductions of a robustness test that guarantees the robustness of multiobjective SciML-based optimization by showing that its results respect the universal approximator theorem. The test is applied in the framework of a novel methodology which is evaluated in a series of benchmarks illustrating its consistency. Moreover, the proposed methodology results are compared with feasible regions of rigorous optimization, which requires a significantly higher computational effort. Hence, this work provides a robustness test for guaranteed robustness in applying SciML tools in multiobjective optimization with lower computational effort than the existent alternative.

CEMar 22, 2023
Efficient hybrid modeling and sorption model discovery for non-linear advection-diffusion-sorption systems: A systematic scientific machine learning approach

Vinicius V. Santana, Erbet Costa, Carine M. Rebello et al.

This study presents a systematic machine learning approach for creating efficient hybrid models and discovering sorption uptake models in non-linear advection-diffusion-sorption systems. It demonstrates an effective method to train these complex systems using gradient based optimizers, adjoint sensitivity analysis, and JIT-compiled vector Jacobian products, combined with spatial discretization and adaptive integrators. Sparse and symbolic regression were employed to identify missing functions in the artificial neural network. The robustness of the proposed method was tested on an in-silico data set of noisy breakthrough curve observations of fixed-bed adsorption, resulting in a well-fitted hybrid model. The study successfully reconstructed sorption uptake kinetics using sparse and symbolic regression, and accurately predicted breakthrough curves using identified polynomials, highlighting the potential of the proposed framework for discovering sorption kinetic law structures.

CHEM-PHJul 6, 2023
PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction

Vinicius Viena Santana, Carine Menezes Rebello, Luana P. Queiroz et al.

Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment.

LGSep 13, 2022
A new Reinforcement Learning framework to discover natural flavor molecules

Luana P. Queiroz, Carine M. Rebello, Erbet A. Costa et al.

The flavor is the focal point in the flavor industry, which follows social tendencies and behaviors. The research and development of new flavoring agents and molecules are essential in this field. On the other hand, the development of natural flavors plays a critical role in modern society. In light of this, the present work proposes a novel framework based on Scientific Machine Learning to undertake an emerging problem in flavor engineering and industry. Therefore, this work brings an innovative methodology to design new natural flavor molecules. The molecules are evaluated regarding the synthetic accessibility, the number of atoms, and the likeness to a natural or pseudo-natural product.

21.8CHEM-PHApr 1
VIANA: character Value-enhanced Intensity Assessment via domain-informed Neural Architecture

Luana P. Queiroz, Icaro S. C. Bernardes, Ana M. Ribeiro et al.

Predicting the perceived intensity of odorants remains a fundamental challenge in sensory science due to the complex, non-linear behavior of their response, as well as the difficulty in correlating molecular structure with human perception. While traditional deep learning models, such as Graph Convolutional Networks (GCNs), excel at capturing molecular topology, they often fail to account for the biological and perceptual context of olfaction. This study introduces VIANA, a novel "tri-pillar" framework that integrates structural graph theory, character value embeddings, and phenomenological behavior. This methodology systematically evaluates knowledge transfer across three distinct domains: molecular structure via GCNs, semantic odor character values via Principal Odor Map (POM) embeddings, and biological dose-response logic via Hill's law. We demonstrate that knowledge transfer is not inherently positive; rather, a balance must be maintained in the volume of information provided to the model. While raw semantic data led to "information overload" in domain-informed models, applying Principal Component Analysis (PCA) to distill the 95% most impactful semantic variance yielded a superior "signal distillation" effect. Results indicate that the synthesis of these three knowledge transfer pillars significantly outperforms baseline structural models, with VIANA achieving a peak R^2 of 0.996 and a test Mean Squared Error (MSE) of 0.19. In this context, VIANA successfully captures the physical ceiling of saturation, the sensitivity of detection thresholds, and the nuance of odor character value expression, providing a domain grounded simulation of the human olfactory experience. This research provides a robust framework for digital olfaction, effectively bridging the gap between molecular informatics and sensory perception.

AINov 21, 2023
Digital Twin Framework for Optimal and Autonomous Decision-Making in Cyber-Physical Systems: Enhancing Reliability and Adaptability in the Oil and Gas Industry

Carine Menezes Rebello, Johannes Jäschkea, Idelfonso B. R. Nogueira

The concept of creating a virtual copy of a complete Cyber-Physical System opens up numerous possibilities, including real-time assessments of the physical environment and continuous learning from the system to provide reliable and precise information. This process, known as the twinning process or the development of a digital twin (DT), has been widely adopted across various industries. However, challenges arise when considering the computational demands of implementing AI models, such as those employed in digital twins, in real-time information exchange scenarios. This work proposes a digital twin framework for optimal and autonomous decision-making applied to a gas-lift process in the oil and gas industry, focusing on enhancing the robustness and adaptability of the DT. The framework combines Bayesian inference, Monte Carlo simulations, transfer learning, online learning, and novel strategies to confer cognition to the DT, including model hyperdimensional reduction and cognitive tack. Consequently, creating a framework for efficient, reliable, and trustworthy DT identification was possible. The proposed approach addresses the current gap in the literature regarding integrating various learning techniques and uncertainty management in digital twin strategies. This digital twin framework aims to provide a reliable and efficient system capable of adapting to changing environments and incorporating prediction uncertainty, thus enhancing the overall decision-making process in complex, real-world scenarios. Additionally, this work lays the foundation for further developments in digital twins for process systems engineering, potentially fostering new advancements and applications across various industrial sectors.

LGMar 1
MultiPUFFIN: A Multimodal Domain-Constrained Foundation Model for Molecular Property Prediction of Small Molecules

Idelfonso B. R. Nogueira, Carine M. Rebelloa, Mumin Enis Leblebici et al.

Predicting physicochemical properties across chemical space is vital for chemical engineering, drug discovery, and materials science. Current molecular foundation models lack thermodynamic consistency, while domain-informed approaches are limited to single properties and small datasets. We introduce MultiPUFFIN, a domain-constrained multimodal foundation model addressing both limitations simultaneously. MultiPUFFIN features: (i) an encoder fusing SMILES, graphs, and 3D geometries via gated cross-modal attention, alongside experimental condition and descriptor encoders; (ii) prediction heads embedding established correlations (e.g., Wagner, Andrade, van't Hoff, and Shomate equations) as inductive biases to ensure thermodynamic consistency; and (iii) a two-stage multi-task training strategy.Extending prior frameworks, MultiPUFFIN predicts nine thermophysical properties simultaneously. It is trained on a multi-source dataset of 37,968 unique molecules (40,904 rows). With roughly 35 million parameters, MultiPUFFIN achieves a mean $R^2 = 0.716$ on a challenging scaffold-split test set of 8,877 molecules. Compared to ChemBERTa-2 (pre-trained on 77 million molecules), MultiPUFFIN outperforms the fine-tuned baseline across all nine properties despite using 2000x fewer training molecules. Advantages are strikingly apparent for temperature-dependent properties, where ChemBERTa-2 lacks the architectural capacity to incorporate thermodynamic conditions.These results demonstrate that multimodal encoding and domain-informed biases substantially reduce data and compute requirements compared to brute-force pre-training. Furthermore, MultiPUFFIN handles missing modalities and recovers meaningful thermodynamic parameters without explicit supervision. Systematic ablation studies confirm the property-specific benefits of these domain-informed prediction heads.

56.2LGMay 18
UTOPYA: A Multimodal Deep Learning Framework for Physics-Informed Anomaly Detection and Time-Series Prediction

Robson W. S. Pessoa, Julien Amblard, Alessandra Russo et al.

Anomaly detection in batch processes is hindered by transient dynamics, scarce fault labels, and reliance on single-modality sensor data. This work introduces UTOPYA (Unified Temporal Observation for Physics-Informed Anomaly Detection and Time-Series Prediction), a 15.2M-parameter multimodal framework that jointly addresses anomaly detection, time-series prediction, and phase classification in batch distillation by fusing eight data modalities through Feature-wise Linear Modulation (FiLM) conditioned cross-modal attention and gated fusion. A physics-informed regularisation scheme introduced in this work enforces temporal smoothness and thermodynamic monotonicity, while curriculum learning introduces training samples in order of physical difficulty. On the 119-experiment multimodal batch distillation dataset of Arweiler et al. (2026), UTOPYA achieves a window-level test AUROC of 0.832 and 0.874 under multi-signal experiment-level scoring, substantially outperforming four external baselines (PCA, autoencoder, Isolation Forest, and LSTM autoencoder) evaluated under identical conditions (+0.147 window-level AUROC over the best baseline). A multimodal ablation over 15~architectural configurations shows that static context via FiLM conditioning is the key enabler, lifting experiment-level multi-signal AUROC by +0.145 over the unimodal baseline (0.729 to 0.874). Separately, a training ablation across 14 design choices reveals that several widely-adopted techniques, including instance normalisation, Mixup, ensembling, test-time augmentation, and stochastic weight averaging, fail to improve or actively degrade generalisation in this data-scarce setting. These negative results expose a fundamental tension between smoothing-based regularisation and anomaly detection, providing practical guidance for multimodal process monitoring deployment.

SYOct 18, 2023
Dynamic financial processes identification using sparse regressive reservoir computers

Fredy Vides, Idelfonso B. R. Nogueira, Gabriela Lopez Gutierrez et al.

In this document, we present key findings in structured matrix approximation theory, with applications to the regressive representation of dynamic financial processes. Initially, we explore a comprehensive approach involving generic nonlinear time delay embedding for time series data extracted from a financial or economic system under examination. Subsequently, we employ sparse least-squares and structured matrix approximation methods to discern approximate representations of the output coupling matrices. These representations play a pivotal role in establishing the regressive models corresponding to the recursive structures inherent in a given financial system. The document further introduces prototypical algorithms that leverage the aforementioned techniques. These algorithms are demonstrated through applications in approximate identification and predictive simulation of dynamic financial and economic processes, encompassing scenarios that may or may not exhibit chaotic behavior.

SYNov 16, 2023
Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers

Fredy Vides, Idelfonso B. R. Nogueira, Gabriela Lopez Gutierrez et al.

The investigation reported in this document focuses on identifying systems with symmetries using equivariant autoregressive reservoir computers. General results in structured matrix approximation theory are presented, exploring a two-fold approach. Firstly, a comprehensive examination of generic symmetry-preserving nonlinear time delay embedding is conducted. This involves analyzing time series data sampled from an equivariant system under study. Secondly, sparse least-squares methods are applied to discern approximate representations of the output coupling matrices. These matrices play a critical role in determining the nonlinear autoregressive representation of an equivariant system. The structural characteristics of these matrices are dictated by the set of symmetries inherent in the system. The document outlines prototypical algorithms derived from the described techniques, offering insight into their practical applications. Emphasis is placed on the significant improvement on structured identification precision when compared to classical reservoir computing methods for the simulation of equivariant dynamical systems.

11.0LGApr 26
WISE-FM:Operation-Aware, Engineering-Informed Foundation Model for Multi-Task Well Design

Carine de Menezes Rebello, Anderson Rapello dos Santos, Idelfonso B. R. Nogueira

Deploying machine learning models across diverse well portfolios requires generalisation to wells with design parameters outside the training distribution. Current data-driven approaches to virtual flow metering (VFM) and bottomhole estimation typically treat each well independently or ignore the influence of well design on operational behaviour. We present WISE (Well Intelligence and Systems Engineering Foundation Model), a design-aware, physics-informed multi-task model that integrates three complementary mechanisms: Feature-wise Linear Modulation (FiLM) and cross-modal attention to condition operational embeddings on well design parameters; multi-task learning for simultaneous prediction of flow rates, bottomhole conditions, and flow regime classification; and structural mass conservation with soft physics constraints derived from well engineering principles. Evaluation on the ManyWells benchmark (2000 simulated wells, $10^6$ data points) demonstrates that design-aware models reduce VFM prediction error by up to $13\times$ compared to design-unaware baselines, and that physics constraints reduce negative flow predictions by 65%. Flow regime classification achieves 97.7% bottomhole accuracy, providing continuous well integrity monitoring without additional sensors. The methodology transfers to real operational data from five Equinor Volve producers (oil rate $R^2 = 0.89$, bottomhole pressure $R^2 = 0.98$, water rate $R^2 = 0.97$). The trained model additionally serves as a fast surrogate for integrity-aware well design optimisation over a 24-dimensional design space, with more than $1000\times$ speedup over drift-flux simulations. These results demonstrate that design awareness, physics enforcement, and multi-task learning are essential and complementary ingredients for foundation models intended to operate across large well portfolios.

CHEM-PHFeb 19, 2024
Molecule Generation and Optimization for Efficient Fragrance Creation

Bruno C. L. Rodrigues, Vinicius V. Santana, Sandris Murins et al.

This research introduces a Machine Learning-centric approach to replicate olfactory experiences, validated through experimental quantification of perfume perception. Key contributions encompass a hybrid model connecting perfume molecular structure to human olfactory perception. This model includes an AI-driven molecule generator (utilizing Graph and Generative Neural Networks), quantification and prediction of odor intensity, and refinery of optimal solvent and molecule combinations for desired fragrances. Additionally, a thermodynamic-based model establishes a link between olfactory perception and liquid-phase concentrations. The methodology employs Transfer Learning and selects the most suitable molecules based on vapor pressure and fragrance notes. Ultimately, a mathematical optimization problem is formulated to minimize discrepancies between new and target olfactory experiences. The methodology is validated by reproducing two distinct olfactory experiences using available experimental data.

CHEM-PHDec 6, 2023
Optimizing $CO_{2}$ Capture in Pressure Swing Adsorption Units: A Deep Neural Network Approach with Optimality Evaluation and Operating Maps for Decision-Making

Carine Menezes Rebello, Idelfonso B. R. Nogueira

This study presents a methodology for surrogate optimization of cyclic adsorption processes, focusing on enhancing Pressure Swing Adsorption units for carbon dioxide ($CO_{2}$) capture. We developed and implemented a multiple-input, single-output (MISO) framework comprising two deep neural network (DNN) models, predicting key process performance indicators. These models were then integrated into an optimization framework, leveraging particle swarm optimization (PSO) and statistical analysis to generate a comprehensive Pareto front representation. This approach delineated feasible operational regions (FORs) and highlighted the spectrum of optimal decision-making scenarios. A key aspect of our methodology was the evaluation of optimization effectiveness. This was accomplished by testing decision variables derived from the Pareto front against a phenomenological model, affirming the surrogate models reliability. Subsequently, the study delved into analyzing the feasible operational domains of these decision variables. A detailed correlation map was constructed to elucidate the interplay between these variables, thereby uncovering the most impactful factors influencing process behavior. The study offers a practical, insightful operational map that aids operators in pinpointing the optimal process location and prioritizing specific operational goals.