90.4LGMay 29
Universal Decision LearnersSridhar Mahadevan
Many theories of decision making -- planning, reinforcement learning, causal intervention, online learning, and game-theoretic equilibrium -- turn local information into globally coherent behavior. This paper proposes a common categorical formulation: a Universal Decision Learner (UDL) extends a partially specified decision functor from observed contexts to new contexts by a pair of universal constructions. Left Kan extensions express rollout, aggregation, and candidate generation; right Kan extensions express consistency, constraint satisfaction, and fixed-point semantics. The central claim is not that every decision problem has the same algorithm, but that many decision formalisms instantiate the same universal problem: extend local behavioral data canonically, then characterize the globally coherent extensions. We give the abstract UDL construction, prove its universal comparison property, define Kan-invariant behavioral equivalence and minimal abstractions, and show how Bellman equations, planning recursions, causal interventions, online regret, and equilibria arise as special cases. The supplementary material develops the reinforcement-learning specialization in more detail.
76.3MEMay 30
Causal Density FunctionsSridhar Mahadevan
We introduce causal density functions: Radon-Nikodym derivatives that compare interventional laws to observational laws and therefore act as local density ratios for causal effects. Whereas many causal-strength measures compare whole distributions after graph surgery, causal density functions provide a pointwise change-of-measure object that can be estimated, calibrated, and used to score directed influence. The basic identity \[ \mathbb{E}_{\mathrm{do}}[f(Y)] = \mathbb{E}_{\mathrm{obs}}\!\left[f(Y)ρ(X,Y)\right] \] makes causal density directly testable: if the estimated density ratio is correct, observational expectations reweighted by $ρ$ reproduce interventional expectations. We derive practical estimators for do-curves and directed edge scores, relate the construction to Radon-Nikodym/Kan semantics for conditioning and intervention, and evaluate the resulting estimators on synthetic and real perturbation benchmarks.
LGMar 29, 2023
An Over-parameterized Exponential RegressionYeqi Gao, Sridhar Mahadevan, Zhao Song
Over the past few years, there has been a significant amount of research focused on studying the ReLU activation function, with the aim of achieving neural network convergence through over-parametrization. However, recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions, specifically in the attention mechanism. Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function. Given a set of data points with labels $\{(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\} \subset \mathbb{R}^d \times \mathbb{R}$ where $n$ denotes the number of the data. Here $F(W(t),x)$ can be expressed as $F(W(t),x) := \sum_{r=1}^m a_r \exp(\langle w_r, x \rangle)$, where $m$ represents the number of neurons, and $w_r(t)$ are weights at time $t$. It's standard in literature that $a_r$ are the fixed weights and it's never changed during the training. We initialize the weights $W(0) \in \mathbb{R}^{d \times m}$ with random Gaussian distributions, such that $w_r(0) \sim \mathcal{N}(0, I_d)$ and initialize $a_r$ from random sign distribution for each $r \in [m]$. Using the gradient descent algorithm, we can find a weight $W(T)$ such that $\| F(W(T), X) - y \|_2 \leq ε$ holds with probability $1-δ$, where $ε\in (0,0.1)$ and $m = Ω(n^{2+o(1)}\log(n/δ))$. To optimize the over-parameterization bound $m$, we employ several tight analysis techniques from previous studies [Song and Yang arXiv 2019, Munteanu, Omlor, Song and Woodruff ICML 2022].
DSApr 10, 2023
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature DimensionYichuan Deng, Sridhar Mahadevan, Zhao Song
Large language models (LLMs) have shown their power in different areas. Attention computation, as an important subroutine of LLMs, has also attracted interests in theory. Recently the static computation and dynamic maintenance of attention matrix has been studied by [Alman and Song 2023] and [Brand, Song and Zhou 2023] from both algorithmic perspective and hardness perspective. In this work, we consider the sparsification of the attention problem. We make one simplification which is the logit matrix is symmetric. Let $n$ denote the length of sentence, let $d$ denote the embedding dimension. Given a matrix $X \in \mathbb{R}^{n \times d}$, suppose $d \gg n$ and $\| X X^\top \|_{\infty} < r$ with $r \in (0,0.1)$, then we aim for finding $Y \in \mathbb{R}^{n \times m}$ (where $m\ll d$) such that \begin{align*} \| D(Y)^{-1} \exp( Y Y^\top ) - D(X)^{-1} \exp( X X^\top) \|_{\infty} \leq O(r) \end{align*} We provide two results for this problem. $\bullet$ Our first result is a randomized algorithm. It runs in $\widetilde{O}(\mathrm{nnz}(X) + n^ω ) $ time, has $1-δ$ succeed probability, and chooses $m = O(n \log(n/δ))$. Here $\mathrm{nnz}(X)$ denotes the number of non-zero entries in $X$. We use $ω$ to denote the exponent of matrix multiplication. Currently $ω\approx 2.373$. $\bullet$ Our second result is a deterministic algorithm. It runs in $\widetilde{O}(\min\{\sum_{i\in[d]}\mathrm{nnz}(X_i)^2, dn^{ω-1}\} + n^{ω+1})$ time and chooses $m = O(n)$. Here $X_i$ denote the $i$-th column of matrix $X$. Our main findings have the following implication for applied LLMs task: for any super large feature dimension, we can reduce it down to the size nearly linear in length of sentence.
LGJul 17, 2023
Zero-th Order Algorithm for Softmax Attention OptimizationYichuan Deng, Zhihang Li, Sridhar Mahadevan et al.
Large language models (LLMs) have brought about significant transformations in human society. Among the crucial computations in LLMs, the softmax unit holds great importance. Its helps the model generating a probability distribution on potential subsequent words or phrases, considering a series of input words. By utilizing this distribution, the model selects the most probable next word or phrase, based on the assigned probabilities. The softmax unit assumes a vital function in LLM training as it facilitates learning from data through the adjustment of neural network weights and biases. With the development of the size of LLMs, computing the gradient becomes expensive. However, Zero-th Order method can approximately compute the gradient with only forward passes. In this paper, we present a Zero-th Order algorithm specifically tailored for Softmax optimization. We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs. By leveraging the Zeroth-Order method, our work contributes to the advancement of optimization techniques in the context of complex language models.
AIJul 6, 2022
On The Universality of Diagrams for Causal Inference and The Causal Reproducing PropertySridhar Mahadevan
We propose Universal Causality, an overarching framework based on category theory that defines the universal property that underlies causal inference independent of the underlying representational formalism used. More formally, universal causal models are defined as categories consisting of objects and morphisms between them representing causal influences, as well as structures for carrying out interventions (experiments) and evaluating their outcomes (observations). Functors map between categories, and natural transformations map between a pair of functors across the same two categories. Abstract causal diagrams in our framework are built using universal constructions from category theory, including the limit or co-limit of an abstract causal diagram, or more generally, the Kan extension. We present two foundational results in universal causal inference. The first result, called the Universal Causality Theorem (UCT), pertains to the universality of diagrams, which are viewed as functors mapping both objects and relationships from an indexing category of abstract causal diagrams to an actual causal model whose nodes are labeled by random variables, and edges represent functional or probabilistic relationships. UCT states that any causal inference can be represented in a canonical way as the co-limit of an abstract causal diagram of representable objects. UCT follows from a basic result in the theory of sheaves. The second result, the Causal Reproducing Property (CRP), states that any causal influence of a object X on another object Y is representable as a natural transformation between two abstract causal diagrams. CRP follows from the Yoneda Lemma, one of the deepest results in category theory. The CRP property is analogous to the reproducing property in Reproducing Kernel Hilbert Spaces that served as the foundation for kernel methods in machine learning.
AISep 13, 2022
Unifying Causal Inference and Reinforcement Learning using Higher-Order Category TheorySridhar Mahadevan
We present a unified formalism for structure discovery of causal models and predictive state representation (PSR) models in reinforcement learning (RL) using higher-order category theory. Specifically, we model structure discovery in both settings using simplicial objects, contravariant functors from the category of ordinal numbers into any category. Fragments of causal models that are equivalent under conditional independence -- defined as causal horns -- as well as subsequences of potential tests in a predictive state representation -- defined as predictive horns -- are both special cases of horns of a simplicial object, subsets resulting from the removal of the interior and the face opposite a particular vertex. Latent structure discovery in both settings involve the same fundamental mathematical problem of finding extensions of horns of simplicial objects through solving lifting problems in commutative diagrams, and exploiting weak homotopies that define higher-order symmetries. Solutions to the problem of filling "inner" vs "outer" horns leads to various notions of higher-order categories, including weak Kan complexes and quasicategories. We define the abstract problem of structure discovery in both settings in terms of adjoint functors between the category of universal causal models or universal decision models and its simplicial object representation.
LGApr 23, 2022
Smoothed Online Combinatorial Optimization Using Imperfect PredictionsKai Wang, Zhao Song, Georgios Theocharous et al.
Smoothed online combinatorial optimization considers a learner who repeatedly chooses a combinatorial decision to minimize an unknown changing cost function with a penalty on switching decisions in consecutive rounds. We study smoothed online combinatorial optimization problems when an imperfect predictive model is available, where the model can forecast the future cost functions with uncertainty. We show that using predictions to plan for a finite time horizon leads to regret dependent on the total predictive uncertainty and an additional switching cost. This observation suggests choosing a suitable planning window to balance between uncertainty and switching cost, which leads to an online algorithm with guarantees on the upper and lower bounds of the cumulative regret. Empirically, our algorithm shows a significant improvement in cumulative regret compared to other baselines in synthetic online distributed streaming problems.
64.8LGMay 26
Kan Extension Transformers: A Categorical Unification of Attention, Diffusion, and Predict-Detach Self-ConditioningSridhar Mahadevan
We propose Kan Extension Transformers (KETs) as a unifying categorical framework for a diverse group of Transformer implementations. The core claim is that a Transformer layer can be viewed as a weighted structured extension operator: standard attention is the singleton-neighborhood case, Geometric Transformer style incidence mixing is a sparse edge-restricted case, and KET is the higher-order simplicial case. This lens also clarifies a bridge to diffusion-style completion. When the extension operator acts on detached predictive carriers instead of teacher-forced hidden states, it becomes a valid self-conditioning mechanism that exposes noncausal structure without leaking gold future tokens. We include a comprehensive experimental validation of 12 different Transformer implementations varying across strict-causal and predict-detach regimes on Penn Treebank, WikiText-2, and WikiText-103. In the strict-causal setting, quadratic KET is the strongest model among the compared causal architectures on WikiText-2 and WikiText-103. Across all datasets, however, the largest gains come from the predict-detach regime rather than from changing the neighborhood family alone.
AIDec 18, 2022
A Layered Architecture for Universal CausalitySridhar Mahadevan
We propose a layered hierarchical architecture called UCLA (Universal Causality Layered Architecture), which combines multiple levels of categorical abstraction for causal inference. At the top-most level, causal interventions are modeled combinatorially using a simplicial category of ordinal numbers. At the second layer, causal models are defined by a graph-type category. The non-random ``surgical" operations on causal structures, such as edge deletion, are captured using degeneracy and face operators from the simplicial layer above. The third categorical abstraction layer corresponds to the data layer in causal inference. The fourth homotopy layer comprises of additional structure imposed on the instance layer above, such as a topological space, which enables evaluating causal models on datasets. Functors map between every pair of layers in UCLA. Each functor between layers is characterized by a universal arrow, which defines an isomorphism between every pair of categorical layers. These universal arrows define universal elements and representations through the Yoneda Lemma, and in turn lead to a new category of elements based on a construction introduced by Grothendieck. Causal inference between each pair of layers is defined as a lifting problem, a commutative diagram whose objects are categories, and whose morphisms are functors that are characterized as different types of fibrations. We illustrate the UCLA architecture using a range of examples, including integer-valued multisets that represent a non-graphical framework for conditional independence, and causal models based on graphs and string diagrams using symmetric monoidal categories. We define causal effect in terms of the homotopy colimit of the nerve of the category of elements.
MENov 3, 2022
Privacy Aware Experiments without CookiesShiv Shankar, Ritwik Sinha, Saayan Mitra et al.
Consider two brands that want to jointly test alternate web experiences for their customers with an A/B test. Such collaborative tests are today enabled using \textit{third-party cookies}, where each brand has information on the identity of visitors to another website. With the imminent elimination of third-party cookies, such A/B tests will become untenable. We propose a two-stage experimental design, where the two brands only need to agree on high-level aggregate parameters of the experiment to test the alternate experiences. Our design respects the privacy of customers. We propose an estimater of the Average Treatment Effect (ATE), show that it is unbiased and theoretically compute its variance. Our demonstration describes how a marketer for a brand can design such an experiment and analyze the results. On real and simulated data, we show that the approach provides valid estimate of the ATE with low variance and is robust to the proportion of visitors overlapping across the brands.
AIJan 8
Categorical Belief Propagation: Sheaf-Theoretic Inference via Descent and HolonomyEnrique ter Horst, Sridhar Mahadevan, Juan Diego Zambrano
We develop a categorical foundation for belief propagation on factor graphs. We construct the free hypergraph category \(\Syn_Σ\) on a typed signature and prove its universal property, yielding compositional semantics via a unique functor to the matrix category \(\cat{Mat}_R\). Message-passing is formulated using a Grothendieck fibration \(\int\Msg \to \cat{FG}_Σ\) over polarized factor graphs, with schedule-indexed endomorphisms defining BP updates. We characterize exact inference as effective descent: local beliefs form a descent datum when compatibility conditions hold on overlaps. This framework unifies tree exactness, junction tree algorithms, and loopy BP failures under sheaf-theoretic obstructions. We introduce HATCC (Holonomy-Aware Tree Compilation), an algorithm that detects descent obstructions via holonomy computation on the factor nerve, compiles non-trivial holonomy into mode variables, and reduces to tree BP on an augmented graph. Complexity is \(O(n^2 d_{\max} + c \cdot k_{\max} \cdot δ_{\max}^3 + n \cdot δ_{\max}^2)\) for \(n\) factors and \(c\) fundamental cycles. Experimental results demonstrate exact inference with significant speedup over junction trees on grid MRFs and random graphs, along with UNSAT detection on satisfiability instances.
54.7AIMay 13
PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and ModelsSridhar Mahadevan
Large language models can extract local causal claims from text, but those claims become more useful when organized as persistent, navigable world models rather than as flat summaries. We introduce PROMETHEUS, a framework that turns retrieved literature, filings, reviews, reports, agent traces, source data, code, simulations, and scientific models into causal atlases: sheaf-like families of local causal predictive-state models over an explicit cover of a research substrate. Each local region contains causal episodes, structured claim tables, predictive tests, support statistics, and provenance; restriction maps compare overlapping regions; gluing diagnostics expose agreement, drift, contradiction, and underdetermination. The resulting Topos World Model is not a single universal graph. It is a research instrument for navigating what a corpus says, where it says it, how strongly it is supported, and where local claims fail to assemble into a coherent global view. Three literature-atlas case studies -- ocean-temperature impacts on marine populations, GLP-1 weight-loss evidence, and resveratrol/red-wine health-benefit claims -- illustrate deep causal research from text with explicit locality, evidence, persistent state, and gluing tension. Four grounded-counterfactual case studies -- a Nature Climate Change microplastics forcing paper, an Indus Valley hydrology paper with VIC-derived figure data and model code, the canonical Sachs protein-signaling study with single-cell perturbation data, and a Nature singing-mouse study with MAPseq projection matrices -- show a stronger mode: when a paper ships source data, simulation outputs, or code, PROMETHEUS can evaluate a counterfactual against that scientific substrate and then rebuild the sheaf world model around the
AIAug 23, 2022
Categoroids: Universal Conditional IndependenceSridhar Mahadevan
Conditional independence has been widely used in AI, causal inference, machine learning, and statistics. We introduce categoroids, an algebraic structure for characterizing universal properties of conditional independence. Categoroids are defined as a hybrid of two categories: one encoding a preordered lattice structure defined by objects and arrows between them; the second dual parameterization involves trigonoidal objects and morphisms defining a conditional independence structure, with bridge morphisms providing the interface between the binary and ternary structures. We illustrate categoroids using three well-known examples of axiom sets: graphoids, integer-valued multisets, and separoids. Functoroids map one categoroid to another, preserving the relationships defined by all three types of arrows in the co-domain categoroid. We describe a natural transformation across functoroids, which is natural across regular objects and trigonoidal objects, to construct universal representations of conditional independence.. We use adjunctions and monads between categoroids to abstractly characterize faithfulness of graphical and non-graphical representations of conditional independence.
DBJan 13
CSQL: Mapping Documents into Causal DatabasesSridhar Mahadevan
We describe a novel system, CSQL, which automatically converts a collection of unstructured text documents into an SQL-queryable causal database (CDB). A CDB differs from a traditional DB: it is designed to answer "why'' questions via causal interventions and structured causal queries. CSQL builds on our earlier system, DEMOCRITUS, which converts documents into thousands of local causal models derived from causal discourse. Unlike RAG-based systems or knowledge-graph based approaches, CSQL supports causal analysis over document collections rather than purely associative retrieval. For example, given an article on the origins of human bipedal walking, CSQL enables queries such as: "What are the strongest causal influences on bipedalism?'' or "Which variables act as causal hubs with the largest downstream influence?'' Beyond single-document case studies, we show that CSQL can also ingest RAG/IE-compiled causal corpora at scale by compiling the Testing Causal Claims (TCC) dataset of economics papers into a causal database containing 265,656 claim instances spanning 45,319 papers, 44 years, and 1,575 reported method strings, thereby enabling corpus-level causal queries and longitudinal analyses in CSQL. Viewed abstractly, CSQL functions as a compiler from unstructured documents into a causal database equipped with a principled algebra of queries, and can be applied broadly across many domains ranging from business, humanities, and science.
AIDec 8, 2025
Large Causal Models from Large Language ModelsSridhar Mahadevan
We introduce a new paradigm for building large causal models (LCMs) that exploits the enormous potential latent in today's large language models (LLMs). We describe our ongoing experiments with an implemented system called DEMOCRITUS (Decentralized Extraction of Manifold Ontologies of Causal Relations Integrating Topos Universal Slices) aimed at building, organizing, and visualizing LCMs that span disparate domains extracted from carefully targeted textual queries to LLMs. DEMOCRITUS is methodologically distinct from traditional narrow domain and hypothesis centered causal inference that builds causal models from experiments that produce numerical data. A high-quality LLM is used to propose topics, generate causal questions, and extract plausible causal statements from a diverse range of domains. The technical challenge is then to take these isolated, fragmented, potentially ambiguous and possibly conflicting causal claims, and weave them into a coherent whole, converting them into relational causal triples and embedding them into a LCM. Addressing this technical challenge required inventing new categorical machine learning methods, which we can only briefly summarize in this paper, as it is focused more on the systems side of building DEMOCRITUS. We describe the implementation pipeline for DEMOCRITUS comprising of six modules, examine its computational cost profile to determine where the current bottlenecks in scaling the system to larger models. We describe the results of using DEMOCRITUS over a wide range of domains, spanning archaeology, biology, climate change, economics, medicine and technology. We discuss the limitations of the current DEMOCRITUS system, and outline directions for extending its capabilities.
AIFeb 2, 2024
Universal Imitation GamesSridhar Mahadevan
Alan Turing proposed in 1950 a framework called an imitation game to decide if a machine could think. Using mathematics developed largely after Turing -- category theory -- we analyze a broader class of universal imitation games (UIGs), which includes static, dynamic, and evolutionary games. In static games, the participants are in a steady state. In dynamic UIGs, "learner" participants are trying to imitate "teacher" participants over the long run. In evolutionary UIGs, the participants are competing against each other in an evolutionary game, and participants can go extinct and be replaced by others with higher fitness. We use the framework of category theory -- in particular, two influential results by Yoneda -- to characterize each type of imitation game. Universal properties in categories are defined by initial and final objects. We characterize dynamic UIGs where participants are learning by inductive inference as initial algebras over well-founded sets, and contrast them with participants learning by conductive inference over the final coalgebra of non-well-founded sets. We briefly discuss the extension of our categorical framework for UIGs to imitation games on quantum computers.
LGAug 20, 2025
Universal Reinforcement Learning in Coalgebras: Asynchronous Stochastic Computation via ConductionSridhar Mahadevan
In this paper, we introduce a categorial generalization of RL, termed universal reinforcement learning (URL), building on powerful mathematical abstractions from the study of coinduction on non-well-founded sets and universal coalgebras, topos theory, and categorial models of asynchronous parallel distributed computation. In the first half of the paper, we review the basic RL framework, illustrate the use of categories and functors in RL, showing how they lead to interesting insights. In particular, we also introduce a standard model of asynchronous distributed minimization proposed by Bertsekas and Tsitsiklis, and describe the relationship between metric coinduction and their proof of the Asynchronous Convergence Theorem. The space of algorithms for MDPs or PSRs can be modeled as a functor category, where the co-domain category forms a topos, which admits all (co)limits, possesses a subobject classifier, and has exponential objects. In the second half of the paper, we move on to universal coalgebras. Dynamical system models, such as Markov decision processes (MDPs), partially observed MDPs (POMDPs), a predictive state representation (PSRs), and linear dynamical systems (LDSs) are all special types of coalgebras. We describe a broad family of universal coalgebras, extending the dynamic system models studied previously in RL. The core problem in finding fixed points in RL to determine the exact or approximate (action) value function is generalized in URL to determining the final coalgebra asynchronously in a parallel distributed manner.
AIOct 27, 2025
Decentralized Causal Discovery using Judo CalculusSridhar Mahadevan
We describe a theory and implementation of an intuitionistic decentralized framework for causal discovery using judo calculus, which is formally defined as j-stable causal inference using j-do-calculus in a topos of sheaves. In real-world applications -- from biology to medicine and social science -- causal effects depend on regime (age, country, dose, genotype, or lab protocol). Our proposed judo calculus formalizes this context dependence formally as local truth: a causal claim is proven true on a cover of regimes, not everywhere at once. The Lawvere-Tierney modal operator j chooses which regimes are relevant; j-stability means the claim holds constructively and consistently across that family. We describe an algorithmic and implementation framework for judo calculus, combining it with standard score-based, constraint-based, and gradient-based causal discovery methods. We describe experimental results on a range of domains, from synthetic to real-world datasets from biology and economics. Our experimental results show the computational efficiency gained by the decentralized nature of sheaf-theoretic causal discovery, as well as improved performance over classical causal discovery methods.
LOOct 20, 2025
Intuitionistic $j$-Do-Calculus in Topos Causal ModelsSridhar Mahadevan
In this paper, we generalize Pearl's do-calculus to an Intuitionistic setting called $j$-stable causal inference inside a topos of sheaves. Our framework is an elaboration of the recently proposed framework of Topos Causal Models (TCMs), where causal interventions are defined as subobjects. We generalize the original setting of TCM using the Lawvere-Tierney topology on a topos, defined by a modal operator $j$ on the subobject classifier $Ω$. We introduce $j$-do-calculus, where we replace global truth with local truth defined by Kripke-Joyal semantics, and formalize causal reasoning as structure-preserving morphisms that are stable along $j$-covers. $j$-do-calculus is a sound rule system whose premises and conclusions are formulas of the internal Intuitionistic logic of the causal topos. We define $j$-stability for conditional independences and interventional claims as local truth in the internal logic of the causal topos. We give three inference rules that mirror Pearl's insertion/deletion and action/observation exchange, and we prove soundness in the Kripke-Joyal semantics. A companion paper in preparation will describe how to estimate the required entities from data and instantiate $j$-do with standard discovery procedures (e.g., score-based and constraint-based methods), and will include experimental results on how to (i) form data-driven $j$-covers (via regime/section constructions), (ii) compute chartwise conditional independences after graph surgeries, and (iii) glue them to certify the premises of the $j$-do rules in practice
AIAug 25, 2025
Consciousness as a FunctorSridhar Mahadevan
We propose a novel theory of consciousness as a functor (CF) that receives and transmits contents from unconscious memory into conscious memory. Our CF framework can be seen as a categorial formulation of the Global Workspace Theory proposed by Baars. CF models the ensemble of unconscious processes as a topos category of coalgebras. The internal language of thought in CF is defined as a Multi-modal Universal Mitchell-Benabou Language Embedding (MUMBLE). We model the transmission of information from conscious short-term working memory to long-term unconscious memory using our recently proposed Universal Reinforcement Learning (URL) framework. To model the transmission of information from unconscious long-term memory into resource-constrained short-term memory, we propose a network economic model.
CLAug 7, 2025
A Rose by Any Other Name Would Smell as Sweet: Categorical Homotopy Theory for Large Language ModelsSridhar Mahadevan
Natural language is replete with superficially different statements, such as ``Charles Darwin wrote" and ``Charles Darwin is the author of", which carry the same meaning. Large language models (LLMs) should generate the same next-token probabilities in such cases, but usually do not. Empirical workarounds have been explored, such as using k-NN estimates of sentence similarity to produce smoothed estimates. In this paper, we tackle this problem more abstractly, introducing a categorical homotopy framework for LLMs. We introduce an LLM Markov category to represent probability distributions in language generated by an LLM, where the probability of a sentence, such as ``Charles Darwin wrote" is defined by an arrow in a Markov category. However, this approach runs into difficulties as language is full of equivalent rephrases, and each generates a non-isomorphic arrow in the LLM Markov category. To address this fundamental problem, we use categorical homotopy techniques to capture ``weak equivalences" in an LLM Markov category. We present a detailed overview of application of categorical homotopy to LLMs, from higher algebraic K-theory to model categories, building on powerful theoretical results developed over the past half a century.
AIAug 5, 2025
Topos Causal ModelsSridhar Mahadevan
We propose topos causal models (TCMs), a novel class of causal models that exploit the key properties of a topos category: they are (co)complete, meaning all (co)limits exist, they admit a subobject classifier, and allow exponential objects. The main goal of this paper is to show that these properties are central to many applications in causal inference. For example, subobject classifiers allow a categorical formulation of causal intervention, which creates sub-models. Limits and colimits allow causal diagrams of arbitrary complexity to be ``solved", using a novel interpretation of causal approximation. Exponential objects enable reasoning about equivalence classes of operations on causal models, such as covered edge reversal and causal homotopy. Analogous to structural causal models (SCMs), TCMs are defined by a collection of functions, each defining a ``local autonomous" causal mechanism that assemble to induce a unique global function from exogenous to endogenous variables. Since the category of TCMs is (co)complete, which we prove in this paper, every causal diagram has a ``solution" in the form of a (co)limit: this implies that any arbitrary causal model can be ``approximated" by some global function with respect to the morphisms going into or out of the diagram. Natural transformations are crucial in measuring the quality of approximation. In addition, we show that causal interventions are modeled by subobject classifiers: any sub-model is defined by a monic arrow into its parent model. Exponential objects permit reasoning about entire classes of causal equivalences and interventions. Finally, as TCMs form a topos, they admit an internal logic defined as a Mitchell-Benabou language with an associated Kripke-Joyal semantics. We show how to reason about causal models in TCMs using this internal logic.
AIAug 5, 2025
Topos Theory for Generative AI and LLMsSridhar Mahadevan
We propose the design of novel categorical generative AI architectures (GAIAs) using topos theory, a type of category that is ``set-like": a topos has all (co)limits, is Cartesian closed, and has a subobject classifier. Previous theoretical results on the Transformer model have shown that it is a universal sequence-to-sequence function approximator, and dense in the space of all continuous functions with compact support on the Euclidean space of embeddings of tokens. Building on this theoretical result, we explore novel architectures for LLMs that exploit the property that the category of LLMs, viewed as functions, forms a topos. Previous studies of large language models (LLMs) have focused on daisy-chained linear architectures or mixture-of-experts. In this paper, we use universal constructions in category theory to construct novel LLM architectures based on new types of compositional structures. In particular, these new compositional structures are derived from universal properties of LLM categories, and include pullback, pushout, (co) equalizers, exponential objects, and subobject classifiers. We theoretically validate these new compositional structures by showing that the category of LLMs is (co)complete, meaning that all diagrams have solutions in the form of (co)limits. Building on this completeness result, we then show that the category of LLMs forms a topos, a ``set-like" category, which requires showing the existence of exponential objects as well as subobject classifiers. We use a functorial characterization of backpropagation to define a potential implementation of an LLM topos architecture.
AIFeb 28, 2024
GAIA: Categorical Foundations of Generative AISridhar Mahadevan
In this paper, we propose GAIA, a generative AI architecture based on category theory. GAIA is based on a hierarchical model where modules are organized as a simplicial complex. Each simplicial complex updates its internal parameters biased on information it receives from its superior simplices and in turn relays updates to its subordinate sub-simplices. Parameter updates are formulated in terms of lifting diagrams over simplicial sets, where inner and outer horn extensions correspond to different types of learning problems. Backpropagation is modeled as an endofunctor over the category of parameters, leading to a coalgebraic formulation of deep learning.
AIOct 28, 2021
Universal Decision ModelsSridhar Mahadevan
Humans are universal decision makers: we reason causally to understand the world; we act competitively to gain advantage in commerce, games, and war; and we are able to learn to make better decisions through trial and error. In this paper, we propose Universal Decision Model (UDM), a mathematical formalism based on category theory. Decision objects in a UDM correspond to instances of decision tasks, ranging from causal models and dynamical systems such as Markov decision processes and predictive state representations, to network multiplayer games and Witsenhausen's intrinsic models, which generalizes all these previous formalisms. A UDM is a category of objects, which include decision objects, observation objects, and solution objects. Bisimulation morphisms map between decision objects that capture structure-preserving abstractions. We formulate universal properties of UDMs, including information integration, decision solvability, and hierarchical abstraction. We describe universal functorial representations of UDMs, and propose an algorithm for computing the minimal object in a UDM using algebraic topology. We sketch out an application of UDMs to causal inference in network economics, using a complex multiplayer producer-consumer two-sided marketplace.
OCSep 20, 2021
Causal Inference in Network EconomicsSridhar Mahadevan
Network economics is the study of a rich class of equilibrium problems that occur in the real world, from traffic management to supply chains and two-sided online marketplaces. In this paper we explore causal inference in network economics, building on the mathematical framework of variational inequalities, which is a generalization of classical optimization. Our framework can be viewed as a synthesis of the well-known variational inequality formalism with the broad principles of causal inference
ATSep 20, 2021
Causal HomotopySridhar Mahadevan
We characterize homotopical equivalences between causal DAG models, exploiting the close connections between partially ordered set representations of DAGs (posets) and finite Alexandroff topologies. Alexandroff spaces yield a directional topological space: the topology is defined by a unique minimal basis defined by an open set for each variable x, specified as the intersection of all open sets containing x. Alexandroff spaces induce a (reflexive, transitive) preorder. Alexandroff spaces satisfying the Kolmogorov T0 separation criterion, where open sets distinguish variables, converts the preordering into a partial ordering. Our approach broadly is to construct a topological representation of posets from data, and then use the poset representation to build a conventional DAG causal model. We illustrate our framework by showing how it unifies disparate algorithms and case studies proposed previously. Topology plays two key roles in causal discovery. First, topological separability constraints on datasets have been used in several previous approaches to infer causal structure from observations and interventions. Second, a diverse range ofgraphical models used to represent causal structures can be represented in a unified way in terms of a topological representation of the induced poset structure. We show that the homotopy theory of Alexandroff spaces can be exploited to significantly efficiently reduce the number of possible DAG structures, reducing the search space by several orders of magnitude.
AISep 20, 2021
Asymptotic Causal InferenceSridhar Mahadevan
We investigate causal inference in the asymptotic regime as the number of variables approaches infinity using an information-theoretic framework. We define structural entropy of a causal model in terms of its description complexity measured by the logarithmic growth rate, measured in bits, of all directed acyclic graphs (DAGs), parameterized by the edge density d. Structural entropy yields non-intuitive predictions. If we randomly sample a DAG from the space of all models, in the range d = (0, 1/8), almost surely the model is a two-layer DAG! Semantic entropy quantifies the reduction in entropy where edges are removed by causal intervention. Semantic causal entropy is defined as the f-divergence between the observational distribution and the interventional distribution P', where a subset S of edges are intervened on to determine their causal influence. We compare the decomposability properties of semantic entropy for different choices of f-divergences, including KL-divergence, squared Hellinger distance, and total variation distance. We apply our framework to generalize a recently popular bipartite experimental design for studying causal inference on large datasets, where interventions are carried out on one set of variables (e.g., power plants, items in an online store), but outcomes are measured on a disjoint set of variables (residents near power plants, or shoppers). We generalize bipartite designs to k-partite designs, and describe an optimization framework for finding the optimal k-level DAG architecture for any value of d \in (0, 1/2). As edge density increases, a sequence of phase transitions occur over disjoint intervals of d, with deeper DAG architectures emerging for larger values of d. We also give a quantitative bound on the number of samples needed to reliably test for average causal influence for a k-partite design.
LGSep 19, 2021
Multiscale Manifold WarpingSridhar Mahadevan, Anup Rao, Georgios Theocharous et al.
Many real-world applications require aligning two temporal sequences, including bioinformatics, handwriting recognition, activity recognition, and human-robot coordination. Dynamic Time Warping (DTW) is a popular alignment method, but can fail on high-dimensional real-world data where the dimensions of aligned sequences are often unequal. In this paper, we show that exploiting the multiscale manifold latent structure of real-world data can yield improved alignment. We introduce a novel framework called Warping on Wavelets (WOW) that integrates DTW with a a multi-scale manifold learning framework called Diffusion Wavelets. We present a theoretical analysis of the WOW family of algorithms and show that it outperforms previous state of the art methods, such as canonical time warping (CTW) and manifold warping, on several real-world datasets.
LGJun 6, 2020
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample ComplexityBo Liu, Ian Gemp, Mohammad Ghavamzadeh et al.
In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal ``mirror maps'' to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.
LGJun 6, 2020
Regularized Off-Policy TD-LearningBo Liu, Sridhar Mahadevan, Ji Liu
We present a novel $l_1$ regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.
LGJun 6, 2020
Finite-Sample Analysis of Proximal Gradient TD AlgorithmsBo Liu, Ji Liu, Mohammad Ghavamzadeh et al.
In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms. In this paper, we formulate GTD methods as stochastic gradient algorithms w.r.t.~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.
LGMay 17, 2020
Optimizing for the Future in Non-Stationary MDPsYash Chandak, Georgios Theocharous, Shiv Shankar et al.
Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. To proactively search for a good future policy, we present a policy gradient algorithm that maximizes a forecast of future performance. This forecast is obtained by fitting a curve to the counter-factual estimates of policy performance over time, without explicitly modeling the underlying non-stationarity. The resulting algorithm amounts to a non-uniform reweighting of past data, and we observe that minimizing performance over some of the data from past episodes can be beneficial when searching for a policy that maximizes future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques, on three simulated problems motivated by real-world applications.
LGAug 4, 2018
Global Convergence to the Equilibrium of GANs using Variational InequalitiesIan Gemp, Sridhar Mahadevan
In optimization, the negative gradient of a function denotes the direction of steepest descent. Furthermore, traveling in any direction orthogonal to the gradient maintains the value of the function. In this work, we show that these orthogonal directions that are ignored by gradient descent can be critical in equilibrium problems. Equilibrium problems have drawn heightened attention in machine learning due to the emergence of the Generative Adversarial Network (GAN). We use the framework of Variational Inequalities to analyze popular training algorithms for a fundamental GAN variant: the Wasserstein Linear-Quadratic GAN. We show that the steepest descent direction causes divergence from the equilibrium, and convergence to the equilibrium is achieved through following a particular orthogonal direction. We call this successful technique Crossing-the-Curl, named for its mathematical derivation as well as its intuition: identify the game's axis of rotation and move "across" space in the direction towards smaller "curling".
LGApr 28, 2018
A Unified Framework for Domain Adaptation using Metric Learning on ManifoldsSridhar Mahadevan, Bamdev Mishra, Shalini Ghosh
We present a novel framework for domain adaptation, whereby both geometric and statistical differences between a labeled source domain and unlabeled target domain can be integrated by exploiting the curved Riemannian geometry of statistical manifolds. Our approach is based on formulating transfer from source to target as a problem of geometric mean metric learning on manifolds. Specifically, we exploit the curved Riemannian manifold geometry of symmetric positive definite (SPD) covariance matrices. We exploit a simple but important observation that as the space of covariance matrices is both a Riemannian space as well as a homogeneous space, the shortest path geodesic between two covariances on the manifold can be computed analytically. Statistics on the SPD matrix manifold, such as the geometric mean of two matrices can be reduced to solving the well-known Riccati equation. We show how the Ricatti-based solution can be constrained to not only reduce the statistical differences between the source and target domains, such as aligning second order covariances and minimizing the maximum mean discrepancy, but also the underlying geometry of the source and target domains using diffusions on the underlying source and target manifolds. A key strength of our proposed approach is that it enables integrating multiple sources of variation between source and target in a unified way, by reducing the combined objective function to a nested set of Ricatti equations where the solution can be represented by a cascaded series of geometric mean computations. In addition to showing the theoretical optimality of our solution, we present detailed experiments using standard transfer learning testbeds from computer vision comparing our proposed algorithms to past work in domain adaptation, showing improved results over a large variety of previous methods.
GTOct 19, 2017
Online Monotone GamesIan Gemp, Sridhar Mahadevan
Algorithmic game theory (AGT) focuses on the design and analysis of algorithms for interacting agents, with interactions rigorously formalized within the framework of games. Results from AGT find applications in domains such as online bidding auctions for web advertisements and network routing protocols. Monotone games are games where agent strategies naturally converge to an equilibrium state. Previous results in AGT have been obtained for convex, socially-convex, or smooth games, but not monotone games. Our primary theoretical contributions are defining the monotone game setting and its extension to the online setting, a new notion of regret for this setting, and accompanying algorithms that achieve sub-linear regret. We demonstrate the utility of online monotone game theory on a variety of problem domains including variational inequalities, reinforcement learning, and generative adversarial networks.
LGMar 8, 2017
A Manifold Approach to Learning Mutually Orthogonal SubspacesStephen Giguere, Francisco Garcia, Sridhar Mahadevan
Although many machine learning algorithms involve learning subspaces with particular characteristics, optimizing a parameter matrix that is constrained to represent a subspace can be challenging. One solution is to use Riemannian optimization methods that enforce such constraints implicitly, leveraging the fact that the feasible parameter values form a manifold. While Riemannian methods exist for some specific problems, such as learning a single subspace, there are more general subspace constraints that offer additional flexibility when setting up an optimization problem, but have not been formulated as a manifold. We propose the partitioned subspace (PS) manifold for optimizing matrices that are constrained to represent one or more subspaces. Each point on the manifold defines a partitioning of the input space into mutually orthogonal subspaces, where the number of partitions and their sizes are defined by the user. As a result, distinct groups of features can be learned by defining different objective functions for each partition. We illustrate the properties of the manifold through experiments on multiple dataset analysis and domain adaptation.
LGNov 5, 2016
Generative Multi-Adversarial NetworksIshan Durugkar, Ian Gemp, Sridhar Mahadevan
Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game. In this paper, we propose the \emph{Generative Multi-Adversarial Network} (GMAN), a framework that extends GANs to multiple discriminators. In previous work, the successful training of GANs requires modifying the minimax objective to accelerate training early on. In contrast, GMAN can be reliably trained with the original, untampered objective. We explore a number of design perspectives with the discriminator role ranging from formidable adversary to forgiving teacher. Image generation tasks comparing the proposed framework to standard GANs demonstrate GMAN produces higher quality samples in a fraction of the iterations when measured by a pairwise GAM-type metric.
LGAug 29, 2016
Online Monotone OptimizationIan Gemp, Sridhar Mahadevan
This paper presents a new framework for analyzing and designing no-regret algorithms for dynamic (possibly adversarial) systems. The proposed framework generalizes the popular online convex optimization framework and extends it to its natural limit allowing it to capture a notion of regret that is intuitive for more general problems such as those encountered in game theory and variational inequalities. The framework hinges on a special choice of a system-wide loss function we have developed. Using this framework, we prove that a simple update scheme provides a no-regret algorithm for monotone systems. While previous results in game theory prove individual agents can enjoy unilateral no-regret guarantees, our result proves monotonicity sufficient for guaranteeing no-regret when considering the adjustments of multiple agent strategies in parallel. Furthermore, to our knowledge, this is the first framework to provide a suitable notion of regret for variational inequalities. Most importantly, our proposed framework ensures monotonicity a sufficient condition for employing multiple online learners safely in parallel.
LGAug 21, 2016
Inverting Variational Autoencoders for Improved Generative AccuracyIan Gemp, Ishan Durugkar, Mario Parente et al.
Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets ($\mathbf{x},\mathbf{y}$) to large unlabeled ones ($\mathbf{x}$). In the case where the codomain has known structure, a large unfeatured dataset ($\mathbf{y}$) is potentially available. We develop a parameter-efficient, deep semi-supervised generative model for the purpose of exploiting this untapped data source. Empirical results show improved performance in disentangling latent variable semantics as well as improved discriminative prediction on Martian spectroscopic and handwritten digit domains.
LGJun 15, 2016
Deep Reinforcement Learning With Macro-ActionsIshan P. Durugkar, Clemens Rosenbaum, Stefan Dernbach et al.
Deep reinforcement learning has been shown to be a powerful framework for learning policies from complex high-dimensional sensory inputs to actions in complex tasks, such as the Atari domain. In this paper, we explore output representation modeling in the form of temporal abstraction to improve convergence and reliability of deep reinforcement learning approaches. We concentrate on macro-actions, and evaluate these on different Atari 2600 games, where we show that they yield significant improvements in learning speed. Additionally, we show that they can even achieve better scores than DQN. We offer analysis and explanation for both convergence and final results, revealing a problem deep RL approaches have with sparse reward signals.
CLJul 28, 2015
Reasoning about Linguistic Regularities in Word Embeddings using Matrix ManifoldsSridhar Mahadevan, Sarath Chandar
Recent work has explored methods for learning continuous vector space word representations reflecting the underlying semantics of words. Simple vector space arithmetic using cosine distances has been shown to capture certain types of analogies, such as reasoning about plurals from singulars, past tense from present tense, etc. In this paper, we introduce a new approach to capture analogies in continuous word representations, based on modeling not just individual word vectors, but rather the subspaces spanned by groups of words. We exploit the property that the set of subspaces in n-dimensional Euclidean space form a curved manifold space called the Grassmannian, a quotient subgroup of the Lie group of rotations in n- dimensions. Based on this mathematical model, we develop a modified cosine distance model based on geodesic kernels that captures relation-specific distances across word categories. Our experiments on analogy tasks show that our approach performs significantly better than the previous approaches for the given task.
LGMay 26, 2014
Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual SpacesSridhar Mahadevan, Bo Liu, Philip Thomas et al.
In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization. In this paper, we provide detailed answers to all these questions using the powerful framework of proximal operators. The key idea that emerges is the use of primal dual spaces connected through the use of a Legendre transform. This allows temporal difference updates to occur in dual spaces, allowing a variety of important technical advantages. The Legendre transform elegantly generalizes past algorithms for solving reinforcement learning problems, such as natural gradient methods, which we show relate closely to the previously unconnected framework of mirror descent methods. Equally importantly, proximal operator theory enables the systematic development of operator splitting methods that show how to safely and reliably decompose complex products of gradients that occur in recent variants of gradient-based temporal difference learning. This key technical innovation makes it possible to finally design "true" stochastic gradient methods for reinforcement learning. Finally, Legendre transforms enable a variety of other benefits, including modeling sparsity and domain geometry. Our work builds extensively on recent work on the convergence of saddle-point algorithms, and on the theory of monotone operators.
AIJan 10, 2013
Decision-Theoretic Planning with Concurrent Temporally Extended ActionsKhashayar Rohanimanesh, Sridhar Mahadevan
We investigate a model for planning under uncertainty with temporallyextended actions, where multiple actions can be taken concurrently at each decision epoch. Our model is based on the options framework, and combines it with factored state space models,where the set of options can be partitioned into classes that affectdisjoint state variables. We show that the set of decisionepochs for concurrent options defines a semi-Markov decisionprocess, if the underlying temporally extended actions being parallelized arerestricted to Markov options. This property allows us to use SMDPalgorithms for computing the value function over concurrentoptions. The concurrent options model allows overlapping execution ofoptions in order to achieve higher performance or in order to performa complex task. We describe a simple experiment using a navigationtask which illustrates how concurrent options results in a faster planwhen compared to the case when only one option is taken at a time.
LGOct 16, 2012
Sparse Q-learning with Mirror DescentSridhar Mahadevan, Bo Liu
This paper explores a new framework for reinforcement learning based on online convex optimization, in particular mirror descent and related algorithms. Mirror descent can be viewed as an enhanced gradient method, particularly suited to minimization of convex functions in highdimensional spaces. Unlike traditional gradient methods, mirror descent undertakes gradient updates of weights in both the dual space and primal space, which are linked together using a Legendre transform. Mirror descent can be viewed as a proximal algorithm where the distance generating function used is a Bregman divergence. A new class of proximal-gradient based temporal-difference (TD) methods are presented based on different Bregman divergences, which are more powerful than regular TD learning. Examples of Bregman divergences that are studied include p-norm functions, and Mahalanobis distance based on the covariance of sample gradients. A new family of sparse mirror-descent reinforcement learning methods are proposed, which are able to find sparse fixed points of an l1-regularized Bellman equation at significantly less computational cost than previous methods based on second-order matrix methods. An experimental study of mirror-descent reinforcement learning is presented using discrete and continuous Markov decision processes.
AIJul 4, 2012
Representation Policy IterationSridhar Mahadevan
This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least squares policy iteration (LSPI). The key innovation is a coordinate-free representation of value functions, using the theory of smooth functions on a Riemannian manifold. Hodge theory yields a constructive method for generating basis functions for approximating value functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami) operator on manifolds. In effect, this approach performs a global Fourier analysis on the state space graph to approximate value functions, where the basis functions reflect the largescale topology of the underlying state space. A new class of algorithms called Representation Policy Iteration (RPI) are presented that automatically learn both basis functions and approximately optimal policies. Illustrative experiments compare the performance of RPI with that of LSPI using two handcoded basis functions (RBF and polynomial state encodings).