Thomas J. Ringstrom

AI
h-index32
4papers
8citations
Novelty60%
AI Score34

4 Papers

AINov 20, 2022
Reward is not Necessary: How to Create a Modular & Compositional Self-Preserving Agent for Life-Long Learning

Thomas J. Ringstrom

Reinforcement Learning views the maximization of rewards and avoidance of punishments as central to explaining goal-directed behavior. However, over a life, organisms will need to learn about many different aspects of the world's structure: the states of the world and state-vector transition dynamics. The number of combinations of states grows exponentially as an agent incorporates new knowledge, and there is no obvious weighted combination of pre-existing rewards or costs defined for a given combination of states, as such a weighting would need to encode information about good and bad combinations prior to an agent's experience in the world. Therefore, we must develop more naturalistic accounts of behavior and motivation in large state-spaces. We show that it is possible to use only the intrinsic motivation metric of empowerment, which measures the agent's capacity to realize many possible futures under a transition operator. We propose to scale empowerment to hierarchical state-spaces by using Operator Bellman Equations. These equations produce state-time feasibility functions, which are compositional hierarchical state-time transition operators that map an initial state and time when an agent begins a policy to the final states and times of completing a goal. Because these functions are hierarchical operators we can define hierarchical empowerment measures on them. An agent can then optimize plans to distant states and times to maximize its hierarchical empowerment-gain, allowing it to discover goals that bring about a more favorable coupling of its internal structure (physiological states) to its external environment (world structure & spatial state). Life-long agents could therefore be primarily animated by principles of compositionality and empowerment, exhibiting self-concern for the growth & maintenance of their own structural integrity without recourse to reward-maximization.

LGJun 11, 2025
A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes

Thomas J. Ringstrom, Paul R. Schrater

We introduce Option Kernel Bellman Equations (OKBEs) for a new reward-free Markov Decision Process. Rather than a value function, OKBEs directly construct and optimize a predictive map called a state-time option kernel (STOK) to maximize the probability of completing a goal while avoiding constraint violations. STOKs are compositional, modular, and interpretable initiation-to-termination transition kernels for policies in the Options Framework of Reinforcement Learning. This means: 1) STOKs can be composed using Chapman-Kolmogorov equations to make spatiotemporal predictions for multiple policies over long horizons, 2) high-dimensional STOKs can be represented and computed efficiently in a factorized and reconfigurable form, and 3) STOKs record the probabilities of semantically interpretable goal-success and constraint-violation events, needed for formal verification. Given a high-dimensional state-transition model for an intractable planning problem, we can decompose it with local STOKs and goal-conditioned policies that are aggregated into a factorized goal kernel, making it possible to forward-plan at the level of goals in high-dimensions to solve the problem. These properties lead to highly flexible agents that can rapidly synthesize meta-policies, reuse planning representations across many tasks, and justify goals using empowerment, an intrinsic motivation function. We argue that reward-maximization is in conflict with the properties of compositionality, modularity, and interpretability. Alternatively, OKBEs facilitate these properties to support verifiable long-horizon planning and intrinsic motivation that scales to dynamic high-dimensional world-models.

AIJul 6, 2020
Goal Kernel Planning: Linearly-Solvable Non-Markovian Policies for Logical Tasks with Goal-Conditioned Options

Thomas J. Ringstrom, Mohammadhosein Hasanbeig, Alessandro Abate

In the domain of hierarchical planning, compositionality, abstraction, and task transfer are crucial for designing algorithms that can efficiently solve a variety of problems with maximal representational reuse. Many real-world problems require non-Markovian policies to handle complex structured tasks with logical conditions, often leading to prohibitively large state representations; this requires efficient methods for breaking these problems down and reusing structure between tasks. To this end, we introduce a compositional framework called Linearly-Solvable Goal Kernel Dynamic Programming (LS-GKDP) to address the complexity of solving non-Markovian Boolean sub-goal tasks with ordering constraints. LS-GKDP combines the Linearly-Solvable Markov Decision Process (LMDP) formalism with the Options Framework of Reinforcement Learning. LMDPs can be efficiently solved as a principal eigenvector problem, and options are policies with termination conditions used as temporally extended actions; with LS-GKDP we expand LMDPs to control over options for logical tasks. This involves decomposing a high-dimensional problem down into a set of goal-condition options for each goal and constructing a goal kernel, which is an abstract transition kernel that jumps from an option's initial-states to its termination-states along with an update of the higher-level task-state. We show how an LMDP with a goal kernel enables the efficient optimization of meta-policies in a lower-dimensional subspace defined by the task grounding. Options can also be remapped to new problems within a super-exponential space of tasks without significant recomputation, and we identify cases where the solution is invariant to the task grounding, permitting zero-shot task transfer.

AIJan 29, 2019
Constraint Satisfaction Propagation: Non-stationary Policy Synthesis for Temporal Logic Planning

Thomas J. Ringstrom, Paul R. Schrater

Problems arise when using reward functions to capture dependencies between sequential time-constrained goal states because the state-space must be prohibitively expanded to accommodate a history of successfully achieved sub-goals. Also, policies and value functions derived with stationarity assumptions are not readily decomposable, leading to a tension between reward maximization and task generalization. We demonstrate a logic-compatible approach using model-based knowledge of environment dynamics and deadline information to directly infer non-stationary policies composed of reusable stationary policies. The policies are constructed to maximize the probability of satisfying time-sensitive goals while respecting time-varying obstacles. Our approach explicitly maintains two different spaces, a high-level logical task specification where the task-variables are grounded onto the low-level state-space of a Markov decision process. Computing satisfiability at the task-level is made possible by a Bellman-like equation which operates on a tensor that links the temporal relationship between the two spaces; the equation solves for a value function that can be explicitly interpreted as the probability of sub-goal satisfaction under the synthesized non-stationary policy, an approach we term Constraint Satisfaction Propagation (CSP).