LGSep 21, 2023
Expert-Aided Causal Discovery of Ancestral GraphsTiago da Silva, Bruna Bazaluk, Eliezer de Souza da Silva et al.
Causal discovery (CD) algorithms are notably brittle when data is scarce, inferring unreliable causal relations that may contradict expert knowledge, especially when considering latent confounders. Furthermore, the lack of uncertainty quantification in most CD methods hinders users from diagnosing and refining results. To address these issues, we introduce Ancestral GFlowNets (AGFNs). AGFN samples ancestral graphs (AGs) proportionally to a score-based belief distribution representing our epistemic uncertainty over the causal relationships. Building upon this distribution, we propose an elicitation framework for expert-driven assessment. This framework comprises an optimal experimental design to probe the expert and a scheme to incorporate the obtained feedback into AGFN. Our experiments show that: i) AGFN is competitive against other methods that address latent confounding on both synthetic and real-world datasets; and ii) our design for incorporating feedback from a (simulated) human expert or a Large Language Model (LLM) improves inference quality.
10.8AIMar 12
Anomaly detection in time-series via inductive biases in the latent space of conditional normalizing flowsDavid Baumgartner, Eliezer de Souza da Silva, Iñigo Urteaga
Deep generative models for anomaly detection in multivariate time-series are typically trained by maximizing data likelihood. However, likelihood in observation space measures marginal density rather than conformity to structured temporal dynamics, and therefore can assign high probability to anomalous or out-of-distribution samples. We address this structural limitation by relocating the notion of anomaly to a prescribed latent space. We introduce explicit inductive biases in conditional normalizing flows, modeling time-series observations within a discrete-time state-space framework that constrains latent representations to evolve according to prescribed temporal dynamics. Under this formulation, expected behavior corresponds to compliance with a specified distribution over latent trajectories, while anomalies are defined as violations of these dynamics. Anomaly detection is consequently reduced to a statistically grounded compliance test, such that observations are mapped to latent space and evaluated via goodness-of-fit tests against the prescribed latent evolution. This yields a principled decision rule that remains effective even in regions of high observation likelihood. Experiments on synthetic and real-world time-series demonstrate reliable detection of anomalies in frequency, amplitude, and observation noise, while providing interpretable diagnostics of model compliance.
LGOct 12, 2024
On Divergence Measures for Training GFlowNetsTiago da Silva, Eliezer de Souza da Silva, Diego Mesquita
Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distributions over composable objects, with applications in generative modeling for tasks in fields such as causal discovery, NLP, and drug discovery. Traditionally, the training procedure for GFlowNets seeks to minimize the expected log-squared difference between a proposal (forward policy) and a target (backward policy) distribution, which enforces certain flow-matching conditions. While this training procedure is closely related to variational inference (VI), directly attempting standard Kullback-Leibler (KL) divergence minimization can lead to proven biased and potentially high-variance estimators. Therefore, we first review four divergence measures, namely, Renyi-$α$'s, Tsallis-$α$'s, reverse and forward KL's, and design statistically efficient estimators for their stochastic gradients in the context of training GFlowNets. Then, we verify that properly minimizing these divergences yields a provably correct and empirically effective training scheme, often leading to significantly faster convergence than previously proposed optimization. To achieve this, we design control variates based on the REINFORCE leave-one-out and score-matching estimators to reduce the variance of the learning objectives' gradients. Our work contributes by narrowing the gap between GFlowNets training and generalized variational approximations, paving the way for algorithmic ideas informed by the divergence minimization viewpoint.
57.4CLApr 1
MATH-PT: A Math Reasoning Benchmark for European and Brazilian PortugueseTiago Teixeira, Ana Carolina Erthal, Juan Belieni et al.
The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English. We address this limitation by introducing {\sc Math-PT}, a novel dataset comprising 1,729 mathematical problems written in European and Brazilian Portuguese. {\sc Math-PT} is curated from a variety of high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil. We present a comprehensive benchmark of current state-of-the-art LLMs on {\sc Math-PT}, revealing that frontier reasoning models achieve strong performance in multiple choice questions compared to open weight models, but that their performance decreases for questions with figures or open-ended questions. To facilitate future research, we release the benchmark dataset and model outputs.
MLOct 27, 2019
Prior Specification for Bayesian Matrix Factorization via Prior Predictive MatchingEliezer de Souza da Silva, Tomasz Kuśmierczyk, Marcelo Hartmann et al.
The behavior of many Bayesian models used in machine learning critically depends on the choice of prior distributions, controlled by some hyperparameters that are typically selected by Bayesian optimization or cross-validation. This requires repeated, costly, posterior inference. We provide an alternative for selecting good priors without carrying out posterior inference, building on the prior predictive distribution that marginalizes out the model parameters. We estimate virtual statistics for data generated by the prior predictive distribution and then optimize over the hyperparameters to learn ones for which these virtual statistics match target values provided by the user or estimated from (subset of) the observed data. We apply the principle for probabilistic matrix factorization, for which good solutions for prior selection have been missing. We show that for Poisson factorization models we can analytically determine the hyperparameters, including the number of factors, that best replicate the target statistics, and we study empirically the sensitivity of the approach for model mismatch. We also present a model-independent procedure that determines the hyperparameters for general models by stochastic optimization, and demonstrate this extension in context of hierarchical matrix factorization models.
LGSep 4, 2019
Augmented Memory Networks for Streaming-Based Active One-Shot LearningAndreas Kvistad, Massimiliano Ruocco, Eliezer de Souza da Silva et al.
One of the major challenges in training deep architectures for predictive tasks is the scarcity and cost of labeled training data. Active Learning (AL) is one way of addressing this challenge. In stream-based AL, observations are continuously made available to the learner that have to decide whether to request a label or to make a prediction. The goal is to reduce the request rate while at the same time maximize prediction performance. In previous research, reinforcement learning has been used for learning the AL request/prediction strategy. In our work, we propose to equip a reinforcement learning process with memory augmented neural networks, to enhance the one-shot capabilities. Moreover, we introduce Class Margin Sampling (CMS) as an extension of the standard margin sampling to the reinforcement learning setting. This strategy aims to reduce training time and improve sample efficiency in the training process. We evaluate the proposed method on a classification task using empirical accuracy of label predictions and percentage of label requests. The results indicates that the proposed method, by making use of the memory augmented networks and CMS in the training process, outperforms existing baselines.
IRDec 4, 2018
Time is of the Essence: a Joint Hierarchical RNN and Point Process Model for Time and Item PredictionsBjørnar Vassøy, Massimiliano Ruocco, Eliezer de Souza da Silva et al.
In recent years session-based recommendation has emerged as an increasingly applicable type of recommendation. As sessions consist of sequences of events, this type of recommendation is a natural fit for Recurrent Neural Networks (RNNs). Several additions have been proposed for extending such models in order to handle specific problems or data. Two such extensions are 1.) modeling of inter-session relations for catching long term dependencies over user sessions, and 2.) modeling temporal aspects of user-item interactions. The former allows the session-based recommendation to utilize extended session history and inter-session information when providing new recommendations. The latter has been used to both provide state-of-the-art predictions for when the user will return to the service and also for improving recommendations. In this work we combine these two extensions in a joint model for the tasks of recommendation and return-time prediction. The model consists of a Hierarchical RNN for the inter-session and intra-session items recommendation extended with a Point Process model for the time-gaps between the sessions. The experimental results indicate that the proposed model improves recommendations significantly on two datasets over a strong baseline, while simultaneously improving return-time predictions over a baseline return-time prediction model.