Marcelo Matheus Gauy

SD
h-index6
10papers
92citations
Novelty35%
AI Score40

10 Papers

21.6MAMay 11
Voter Model Meets Rumour Spreading: an FPRAS for Consensus Probabilities on Voter Models with Agnostic Nodes

Marcelo Matheus Gauy, Anna Abramishvili, Eduardo Colli et al.

Problems of consensus in multi-agent systems are often viewed as a series of independent, simultaneous local decisions made between a limited set of options, all aimed at reaching a global agreement. Key challenges in these protocols include estimating the likelihood of various outcomes and finding bounds for how long it may take to achieve consensus, if it occurs at all. To date, little attention has been given to the case where some agents have no initial opinion. In this paper, we introduce a variant of the consensus problem which includes what we call `agnostic' nodes and frame it as a combination of two known and well-studied processes: voter model and rumour spreading. We show (1) a martingale that describes the probability of consensus for a given colour, (2) bounds on the number of steps for the process to end using results from rumour spreading and voter models, (3) closed formulas for the probability of consensus in a few special cases, along with a polynomial-time algorithm for the case where the number of agnostic vertices is at most logarithmic and (4) that the computational complexity of estimating the probability with a Markov chain Monte Carlo process is $O(n^2 \log n)$ for general graphs and $O(n\log n)$ for Erdős-Rényi graphs, resulting in a fully polynomial-time randomized approximation scheme (FPRAS) for estimating the probabilities of consensus. Furthermore, we present experimental results suggesting that the number of runs needed for a given standard error decreases when the number of nodes increases.

SDOct 25, 2022
Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19

Marcelo Matheus Gauy, Marcelo Finger

This work explores speech as a biomarker and investigates the detection of respiratory insufficiency (RI) by analyzing speech samples. Previous work \cite{spira2021} constructed a dataset of respiratory insufficiency COVID-19 patient utterances and analyzed it by means of a convolutional neural network achieving an accuracy of $87.04\%$, validating the hypothesis that one can detect RI through speech. Here, we study how Transformer neural network architectures can improve the performance on RI detection. This approach enables construction of an acoustic model. By choosing the correct pretraining technique, we generate a self-supervised acoustic model, leading to improved performance ($96.53\%$) of Transformers for RI detection.

SDOct 26, 2022
Pretrained audio neural networks for Speech emotion recognition in Portuguese

Marcelo Matheus Gauy, Marcelo Finger

The goal of speech emotion recognition (SER) is to identify the emotional aspects of speech. The SER challenge for Brazilian Portuguese speech was proposed with short snippets of Portuguese which are classified as neutral, non-neutral female and non-neutral male according to paralinguistic elements (laughing, crying, etc). This dataset contains about $50$ minutes of Brazilian Portuguese speech. As the dataset leans on the small side, we investigate whether a combination of transfer learning and data augmentation techniques can produce positive results. Thus, by combining a data augmentation technique called SpecAugment, with the use of Pretrained Audio Neural Networks (PANNs) for transfer learning we are able to obtain interesting results. The PANNs (CNN6, CNN10 and CNN14) are pretrained on a large dataset called AudioSet containing more than $5000$ hours of audio. They were finetuned on the SER dataset and the best performing model (CNN10) on the validation set was submitted to the challenge, achieving an $F1$ score of $0.73$ up from $0.54$ from the baselines provided by the challenge. Moreover, we also tested the use of Transformer neural architecture, pretrained on about $600$ hours of Brazilian Portuguese audio data. Transformers, as well as more complex models of PANNs (CNN14), fail to generalize to the test set in the SER dataset and do not beat the baseline. Considering the limitation of the dataset sizes, currently the best approach for SER is using PANNs (specifically, CNN6 and CNN10).

SDJul 30, 2024
Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation

Marcelo Matheus Gauy, Natalia Hitomi Koza, Ricardo Mikio Morita et al.

We contrast high effectiveness of state of the art deep learning architectures designed for general audio classification tasks, refined for respiratory insufficiency (RI) detection and blood oxygen saturation (SpO$_2$) estimation and classification through automated audio analysis. Recently, multiple deep learning architectures have been proposed to detect RI in COVID patients through audio analysis, achieving accuracy above 95% and F1-score above 0.93. RI is a condition associated with low SpO$_2$ levels, commonly defined as the threshold SpO$_2$ <92%. While SpO$_2$ serves as a crucial determinant of RI, a medical doctor's diagnosis typically relies on multiple factors. These include respiratory frequency, heart rate, SpO$_2$ levels, among others. Here we study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection, where these models achieve near perfect accuracy, surpassing previous results. Yet, for the regression task of estimating SpO$_2$ levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters. Additionally, Pearson correlation coefficients fail to surpass 0.3. As deep learning models perform better in classification than regression, we transform SpO$_2$-regression into a SpO$_2$-threshold binary classification problem, with a threshold of 92%. However, this task still yields an F1-score below 0.65. Thus, audio analysis offers valuable insights into a patient's RI status, but does not provide accurate information about actual SpO$_2$ levels, indicating a separation of domains in which voice and speech biomarkers may and may not be useful in medical diagnostics under current technologies.

SDDec 14, 2023
Acoustic models of Brazilian Portuguese Speech based on Neural Transformers

Marcelo Matheus Gauy, Marcelo Finger

An acoustic model, trained on a significant amount of unlabeled data, consists of a self-supervised learned speech representation useful for solving downstream tasks, perhaps after a fine-tuning of the model in the respective downstream task. In this work, we build an acoustic model of Brazilian Portuguese Speech through a Transformer neural network. This model was pretrained on more than $800$ hours of Brazilian Portuguese Speech, using a combination of pretraining techniques. Using a labeled dataset collected for the detection of respiratory insufficiency in Brazilian Portuguese speakers, we fine-tune the pretrained Transformer neural network on the following tasks: respiratory insufficiency detection, gender recognition and age group classification. We compare the performance of pretrained Transformers on these tasks with that of Transformers without previous pretraining, noting a significant improvement. In particular, the performance of respiratory insufficiency detection obtains the best reported results so far, indicating this kind of acoustic model as a promising tool for speech-as-biomarker approach. Moreover, the performance of gender recognition is comparable to the state of the art models in English.

SDNov 18, 2025
Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report

Daniel Oliveira de Brito, Letícia Gabriella de Souza, Marcelo Matheus Gauy et al.

This technical report investigates the performance of pre-trained audio models on COVID-19 detection tasks using established benchmark datasets. We fine-tuned Audio-MAE and three PANN architectures (CNN6, CNN10, CNN14) on the Coswara and COUGHVID datasets, evaluating both intra-dataset and cross-dataset generalization. We implemented a strict demographic stratification by age and gender to prevent models from exploiting spurious correlations between demographic characteristics and COVID-19 status. Intra-dataset results showed moderate performance, with Audio-MAE achieving the strongest result on Coswara (0.82 AUC, 0.76 F1-score), while all models demonstrated limited performance on Coughvid (AUC 0.58-0.63). Cross-dataset evaluation revealed severe generalization failure across all models (AUC 0.43-0.68), with Audio-MAE showing strong performance degradation (F1-score 0.00-0.08). Our experiments demonstrate that demographic balancing, while reducing apparent model performance, provides more realistic assessment of COVID-19 detection capabilities by eliminating demographic leakage - a confounding factor that inflate performance metrics. Additionally, the limited dataset sizes after balancing (1,219-2,160 samples) proved insufficient for deep learning models that typically require substantially larger training sets. These findings highlight fundamental challenges in developing generalizable audio-based COVID-19 detection systems and underscore the importance of rigorous demographic controls for clinically robust model evaluation.

AIMay 10, 2021
The Influence of Memory in Multi-Agent Consensus

David Kohan Marzagão, Luciana Basualdo Bonatto, Tiago Madeira et al.

Multi-agent consensus problems can often be seen as a sequence of autonomous and independent local choices between a finite set of decision options, with each local choice undertaken simultaneously, and with a shared goal of achieving a global consensus state. Being able to estimate probabilities for the different outcomes and to predict how long it takes for a consensus to be formed, if ever, are core issues for such protocols. Little attention has been given to protocols in which agents can remember past or outdated states. In this paper, we propose a framework to study what we call \emph{memory consensus protocol}. We show that the employment of memory allows such processes to always converge, as well as, in some scenarios, such as cycles, converge faster. We provide a theoretical analysis of the probability of each option eventually winning such processes based on the initial opinions expressed by agents. Further, we perform experiments to investigate network topologies in which agents benefit from memory on the expected time needed for consensus.

NEOct 11, 2019
Improving Gradient Estimation in Evolutionary Strategies With Past Descent Directions

Florian Meier, Asier Mujika, Marcelo Matheus Gauy et al.

Evolutionary Strategies (ES) are known to be an effective black-box optimization technique for deep neural networks when the true gradients cannot be computed, such as in Reinforcement Learning. We continue a recent line of research that uses surrogate gradients to improve the gradient estimation of ES. We propose a novel method to optimally incorporate surrogate gradient information. Our approach, unlike previous work, needs no information about the quality of the surrogate gradients and is always guaranteed to find a descent direction that is better than the surrogate gradient. This allows to iteratively use the previous gradient estimate as surrogate gradient for the current search point. We theoretically prove that this yields fast convergence to the true gradient for linear functions and show under simplifying assumptions that it significantly improves gradient estimates for general functions. Finally, we evaluate our approach empirically on MNIST and reinforcement learning tasks and show that it considerably improves the gradient estimation of ES at no extra computational cost.

LGFeb 11, 2019
Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning

Frederik Benzing, Marcelo Matheus Gauy, Asier Mujika et al.

One of the central goals of Recurrent Neural Networks (RNNs) is to learn long-term dependencies in sequential data. Nevertheless, the most popular training method, Truncated Backpropagation through Time (TBPTT), categorically forbids learning dependencies beyond the truncation horizon. In contrast, the online training algorithm Real Time Recurrent Learning (RTRL) provides untruncated gradients, with the disadvantage of impractically large computational costs. Recently published approaches reduce these costs by providing noisy approximations of RTRL. We present a new approximation algorithm of RTRL, Optimal Kronecker-Sum Approximation (OK). We prove that OK is optimal for a class of approximations of RTRL, which includes all approaches published so far. Additionally, we show that OK has empirically negligible noise: Unlike previous algorithms it matches TBPTT in a real world task (character-level Penn TreeBank) and can exploit online parameter updates to outperform TBPTT in a synthetic string memorization task. Code availiable on github.

NEAug 16, 2018
The linear hidden subset problem for the (1+1) EA with scheduled and adaptive mutation rates

Hafsteinn Einarsson, Marcelo Matheus Gauy, Johannes Lengler et al.

We study unbiased $(1+1)$ evolutionary algorithms on linear functions with an unknown number $n$ of bits with non-zero weight. Static algorithms achieve an optimal runtime of $O(n (\ln n)^{2+ε})$, however, it remained unclear whether more dynamic parameter policies could yield better runtime guarantees. We consider two setups: one where the mutation rate follows a fixed schedule, and one where it may be adapted depending on the history of the run. For the first setup, we give a schedule that achieves a runtime of $(1\pm o(1))βn \ln n$, where $β\approx 3.552$, which is an asymptotic improvement over the runtime of the static setup. Moreover, we show that no schedule admits a better runtime guarantee and that the optimal schedule is essentially unique. For the second setup, we show that the runtime can be further improved to $(1\pm o(1)) e n \ln n$, which matches the performance of algorithms that know $n$ in advance. Finally, we study the related model of initial segment uncertainty with static position-dependent mutation rates, and derive asymptotically optimal lower bounds. This answers a question by Doerr, Doerr, and Kötzing.