AISep 5, 2024Code
InfraLib: Enabling Reinforcement Learning and Decision-Making for Large-Scale Infrastructure ManagementPranay Thangeda, Trevor S. Betz, Michael N. Grussing et al.
Efficient management of infrastructure systems is crucial for economic stability, sustainability, and public safety. However, infrastructure sustainment is challenging due to the vast scale of systems, stochastic deterioration of components, partial observability, and resource constraints. Decision-making strategies that rely solely on human judgment often result in suboptimal decisions over large scales and long horizons. While data-driven approaches like reinforcement learning offer promising solutions, their application has been limited by the lack of suitable simulation environments. We present InfraLib, an open-source modular and extensible framework that enables modeling and analyzing infrastructure management problems with resource constraints as sequential decision-making problems. The framework implements hierarchical, stochastic deterioration models, supports realistic partial observability, and handles practical constraints including cyclical budgets and component unavailability. InfraLib provides standardized environments for benchmarking decision-making approaches, along with tools for expert data collection and policy evaluation. Through case studies on both synthetic benchmarks and real-world road networks, we demonstrate InfraLib's ability to model diverse infrastructure management scenarios while maintaining computational efficiency at scale.
OCMar 18, 2023
Welfare Maximization Algorithm for Solving Budget-Constrained Multi-Component POMDPsManav Vora, Pranay Thangeda, Michael N. Grussing et al.
Partially Observable Markov Decision Processes (POMDPs) provide an efficient way to model real-world sequential decision making processes. Motivated by the problem of maintenance and inspection of a group of infrastructure components with independent dynamics, this paper presents an algorithm to find the optimal policy for a multi-component budget-constrained POMDP. We first introduce a budgeted-POMDP model (b-POMDP) which enables us to find the optimal policy for a POMDP while adhering to budget constraints. Next, we prove that the value function or maximal collected reward for a b-POMDP is a concave function of the budget for the finite horizon case. Our second contribution is an algorithm to calculate the optimal policy for a multi-component budget-constrained POMDP by finding the optimal budget split among the individual component POMDPs. The optimal budget split is posed as a welfare maximization problem and the solution is computed by exploiting the concave nature of the value function. We illustrate the effectiveness of the proposed algorithm by proposing a maintenance and inspection policy for a group of real-world infrastructure components with different deterioration dynamics, inspection and maintenance costs. We show that the proposed algorithm vastly outperforms the policy currently used in practice.
LGAug 13, 2024
Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement LearningManav Vora, Jonas Liang, Michael N. Grussing et al.
Monotonic Partially Observable Markov Decision Processes (POMDPs), where the system state progressively decreases until a restorative action is performed, can be used to model sequential repair problems effectively. This paper considers the problem of solving budget-constrained multi-component monotonic POMDPs, where a finite budget limits the maximal number of restorative actions. For a large number of components, solving such a POMDP using current methods is computationally intractable due to the exponential growth in the state space with an increasing number of components. To address this challenge, we propose a two-step approach. Since the individual components of a budget-constrained multi-component monotonic POMDP are only connected via the shared budget, we first approximate the optimal budget allocation among these components using an approximation of each component POMDP's optimal value function which is obtained through a random forest model. Subsequently, we introduce an oracle-guided meta-trained Proximal Policy Optimization (PPO) algorithm to solve each of the independent budget-constrained single-component monotonic POMDPs. The oracle policy is obtained by performing value iteration on the corresponding monotonic Markov Decision Process (MDP). This two-step method provides scalability in solving truly massive multi-component monotonic POMDPs. To demonstrate the efficacy of our approach, we consider a real-world maintenance scenario that involves inspection and repair of an administrative building by a team of agents within a maintenance budget. Finally, we perform a computational complexity analysis for a varying number of components to show the scalability of the proposed approach.