Mohammad Reza Rezaei

LG
h-index2
4papers
26citations
Novelty50%
AI Score32

4 Papers

17.0CLFeb 16, 2025
Vendi-RAG: Adaptively Trading-Off Diversity And Quality Significantly Improves Retrieval Augmented Generation With LLMs

Mohammad Reza Rezaei, Adji Bousso Dieng

Retrieval-augmented generation (RAG) enhances large language models (LLMs) for domain-specific question-answering (QA) tasks by leveraging external knowledge sources. However, traditional RAG systems primarily focus on relevance-based retrieval and often struggle with redundancy, especially when reasoning requires connecting information from multiple sources. This paper introduces Vendi-RAG, a framework based on an iterative process that jointly optimizes retrieval diversity and answer quality. This joint optimization leads to significantly higher accuracy for multi-hop QA tasks. Vendi-RAG leverages the Vendi Score (VS), a flexible similarity-based diversity metric, to promote semantic diversity in document retrieval. It then uses an LLM judge that evaluates candidate answers, generated after a reasoning step, and outputs a score that the retriever uses to balance relevance and diversity among the retrieved documents during each iteration. Experiments on three challenging datasets -- HotpotQA, MuSiQue, and 2WikiMultiHopQA -- demonstrate Vendi-RAG's effectiveness in multi-hop reasoning tasks. The framework achieves significant accuracy improvements over traditional single-step and multi-step RAG approaches, with accuracy increases reaching up to +4.2% on HotpotQA, +4.1% on 2WikiMultiHopQA, and +1.3% on MuSiQue compared to Adaptive-RAG, the current best baseline. The benefits of Vendi-RAG are even more pronounced as the number of retrieved documents increases. Finally, we evaluated Vendi-RAG across different LLM backbones, including GPT-3.5, GPT-4, and GPT-4o-mini, and observed consistent improvements, demonstrating that the framework's advantages are model-agnostic.

14.4LGFeb 7, 2025
The Alpha-Alternator: Dynamic Adaptation To Varying Noise Levels In Sequences Using The Vendi Score For Improved Robustness and Performance

Mohammad Reza Rezaei, Adji Bousso Dieng

Current state-of-the-art dynamical models, such as Mamba, assume the same level of noisiness for all elements of a given sequence, which limits their performance on noisy temporal data. In this paper, we introduce the $α$-Alternator, a novel generative model for time-dependent data that dynamically adapts to the complexity introduced by varying noise levels in sequences. The $α$-Alternator leverages the Vendi Score (VS), a flexible similarity-based diversity metric, to adjust, at each time step $t$, the influence of the sequence element at time $t$ and the latent representation of the dynamics up to that time step on the predicted future dynamics. This influence is captured by a parameter that is learned and shared across all sequences in a given dataset. The sign of this parameter determines the direction of influence. A negative value indicates a noisy dataset, where a sequence element that increases the VS is considered noisy, and the model relies more on the latent history when processing that element. Conversely, when the parameter is positive, a sequence element that increases the VS is considered informative, and the $α$-Alternator relies more on this new input than on the latent history when updating its predicted latent dynamics. The $α$-Alternator is trained using a combination of observation masking and Alternator loss minimization. Masking simulates varying noise levels in sequences, enabling the model to be more robust to these fluctuations and improving its performance in trajectory prediction, imputation, and forecasting. Our experimental results demonstrate that the $α$-Alternator outperforms both Alternators and state-of-the-art state-space models across neural decoding and time-series forecasting benchmarks.

9.2MLMay 20, 2024
Alternators For Sequence Modeling

Mohammad Reza Rezaei, Adji Bousso Dieng

This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively, over a cycle. The parameters of the OTN and the FTN are not time-dependent and are learned via a minimum cross-entropy criterion over the trajectories. Alternators are versatile. They can be used as dynamical latent-variable generative models or as sequence-to-sequence predictors. Alternators can uncover the latent dynamics underlying complex sequential data, accurately forecast and impute missing data, and sample new trajectories. We showcase the capabilities of alternators in three applications. We first used alternators to model the Lorenz equations, often used to describe chaotic behavior. We then applied alternators to Neuroscience, to map brain activity to physical activity. Finally, we applied alternators to Climate Science, focusing on sea-surface temperature forecasting. In all our experiments, we found alternators are stable to train, fast to sample from, yield high-quality generated samples and latent variables, and often outperform strong baselines such as Mambas, neural ODEs, and diffusion models in the domains we studied.

4.1LGMay 18, 2025
Alternators With Noise Models

Mohammad R. Rezaei, Adji Bousso Dieng

Alternators have recently been introduced as a framework for modeling time-dependent data. They often outperform other popular frameworks, such as state-space models and diffusion models, on challenging time-series tasks. This paper introduces a new Alternator model, called Alternator++, which enhances the flexibility of traditional Alternators by explicitly modeling the noise terms used to sample the latent and observed trajectories, drawing on the idea of noise models from the diffusion modeling literature. Alternator++ optimizes the sum of the Alternator loss and a noise-matching loss. The latter forces the noise trajectories generated by the two noise models to approximate the noise trajectories that produce the observed and latent trajectories. We demonstrate the effectiveness of Alternator++ in tasks such as density estimation, time series imputation, and forecasting, showing that it outperforms several strong baselines, including Mambas, ScoreGrad, and Dyffusion.