Prithvi Raj

LG
h-index1
3papers
34citations
Novelty42%
AI Score40

3 Papers

LGJul 10, 2024
FACTS About Building Retrieval Augmented Generation-based Chatbots

Rama Akkiraju, Anbang Xu, Deepak Bora et al.

Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."

13.2LGApr 20
Kolmogorov-Arnold Energy Models: Fast, Interpretable Generative Modeling

Prithvi Raj

Generative models typically rely on either simple latent priors (e.g., Variational Autoencoders, VAEs), which are efficient but limited, or highly expressive iterative samplers (e.g., Diffusion and Energy-based Models), which are costly and opaque. We introduce the Kolmogorov-Arnold Energy Model (KAEM) to bridge this trade-off and provide a new avenue for latent-space interpretability. Based on a novel interpretation of the Kolmogorov-Arnold Representation Theorem, KAEM imposes a univariate latent structure that enables fast and exact inference via the inverse transform method. With a low-dimensional latent space and appropriate inductive biases, we show that importance sampling becomes a viable, unbiased, and highly efficient posterior inference method. For settings where importance sampling fails, we propose a population-based strategy that decomposes the posterior into a sequence of annealed distributions to improve mixing during sampling, a common pitfall in Energy-based Models. We present initial comparisons of KAEM against VAEs for standard vision datasets, demonstrating its potential for competitive sample quality, inference speed, and interpretability.

LGJun 17, 2025
Kolmogorov-Arnold Energy Models: Fast and Interpretable Generative Modeling

Prithvi Raj

Learning an energy-based model (EBM) in the latent space of a top-down generative model offers a powerful framework for generation across many data modalities. However, it remains unclear how its interpretability can be used to guide model design, improve generative quality, and reduce training time. Moreover, the reliance on Langevin Monte Carlo (LMC) sampling presents challenges in efficiency and sampling multimodal latent distributions. We propose a novel adaptation of the Kolmogorov-Arnold representation theorem for generative modeling and introduce the Kolmogorov-Arnold Energy Model (KAEM) to take advantage of structural and inductive biases. By constraining the prior to univariate relationships, KAEM enables fast and exact inference via the inverse transform method. With the low dimensionality of the latent space and suitable inductive biases encoded, we demonstrate that importance sampling (IS) becomes a viable, unbiased, and highly efficient posterior sampler. For domains where IS fails, we introduce a strategy based on population-based LMC, decomposing the posterior into a sequence of annealed distributions to improve LMC mixing. KAEM balances common generative modeling trade-offs, offering fast inference, interpretability, and stable training, while being naturally suited to Zettascale Computing hardware.