Indranil Saha

h-index26

5papers

45citations

Novelty56%

AI Score28

Ranked #149,920 of 194,257 authors (top 77%)#4,630 in RO (top 69%)

5 Papers

9.0AIDec 2, 2022Code

STL-Based Synthesis of Feedback Controllers Using Reinforcement Learning

Nikhil Kumar Singh, Indranil Saha

Deep Reinforcement Learning (DRL) has the potential to be used for synthesizing feedback controllers (agents) for various complex systems with unknown dynamics. These systems are expected to satisfy diverse safety and liveness properties best captured using temporal logic. In RL, the reward function plays a crucial role in specifying the desired behaviour of these agents. However, the problem of designing the reward function for an RL agent to satisfy complex temporal logic specifications has received limited attention in the literature. To address this, we provide a systematic way of generating rewards in real-time by using the quantitative semantics of Signal Temporal Logic (STL), a widely used temporal logic to specify the behaviour of cyber-physical systems. We propose a new quantitative semantics for STL having several desirable properties, making it suitable for reward generation. We evaluate our STL-based reinforcement learning mechanism on several complex continuous control benchmarks and compare our STL semantics with those available in the literature in terms of their efficacy in synthesizing the controller agent. Experimental results establish our new semantics to be the most suitable for synthesizing feedback controllers for complex continuous dynamical systems through reinforcement learning.

4.1ROMar 15, 2024

Online Concurrent Multi-Robot Coverage Path Planning

Ratijit Mitra, Indranil Saha

Recently, centralized receding horizon online multi-robot coverage path planning algorithms have shown remarkable scalability in thoroughly exploring large, complex, unknown workspaces with many robots. In a horizon, the path planning and the path execution interleave, meaning when the path planning occurs for robots with no paths, the robots with outstanding paths do not execute, and subsequently, when the robots with new or outstanding paths execute to reach respective goals, path planning does not occur for those robots yet to get new paths, leading to wastage of both the robotic and the computation resources. As a remedy, we propose a centralized algorithm that is not horizon-based. It plans paths at any time for a subset of robots with no paths, i.e., who have reached their previously assigned goals, while the rest execute their outstanding paths, thereby enabling concurrent planning and execution. We formally prove that the proposed algorithm ensures complete coverage of an unknown workspace and analyze its time complexity. To demonstrate scalability, we evaluate our algorithm to cover eight large $2$D grid benchmark workspaces with up to 512 aerial and ground robots, respectively. A comparison with a state-of-the-art horizon-based algorithm shows its superiority in completing the coverage with up to 1.6x speedup. For validation, we perform ROS + Gazebo simulations in six 2D grid benchmark workspaces with 10 quadcopters and TurtleBots, respectively. We also successfully conducted one outdoor experiment with three quadcopters and one indoor with two TurtleBots.

2.6LGFeb 5, 2024

Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences

Nikhil Kumar Singh, Indranil Saha

Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms used for model-free control policy synthesis for complex dynamical systems. We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer during the exploration with the goal of reducing the buffer size and maintaining the independent and identically distributed (IID) nature of the samples. Our method is based on selecting an important subset of the set of state variables from the experiences encountered during the initial phase of random exploration, partitioning the state space into a set of abstract states based on the selected important state variables, and finally selecting the experiences with unique state-reward combination by using a kernel density estimator. We formally prove that the off-policy actor-critic algorithm incorporating the proposed method for unique experience accumulation converges faster than the vanilla off-policy actor-critic algorithm. Furthermore, we evaluate our method by comparing it with two state-of-the-art actor-critic RL algorithms on several continuous control benchmarks available in the Gym environment. Experimental results demonstrate that our method achieves a significant reduction in the size of the replay buffer for all the benchmarks while achieving either faster convergent or better reward accumulation compared to the baseline algorithms.

11.6ROMar 4, 2021

MT* : Multi-Robot Path Planning for Temporal Logic Specifications

Dhaval Gujarathi, Indranil Saha

We address the path planning problem for a team of robots satisfying a complex high-level mission specification given in the form of an Linear Temporal Logic (LTL) formula. The state-of-the-art approach to this problem employs the automata-theoretic model checking technique to solve this problem. This approach involves computation of a product graph of the Buchi automaton generated from the LTL specification and a joint transition system which captures the collective motion of the robots and then computation of the shortest path using Dijkstra's shortest path algorithm. We propose MT*, an algorithm that reduces the computation burden for generating such plans for multi-robot systems significantly. Our approach generates a reduced version of the product graph without computing the complete joint transition system, which is computationally expensive. It then divides the complete mission specification among the participating robots and generates the trajectories for the individual robots independently. Our approach demonstrates substantial speedup in terms of computation time over the state-of-the-art approach, and unlike the state of the art approach, scales well with both the number of robots and the size of the workspace

5.3ROFeb 24, 2021

Mobile Recharger Path Planning and Recharge Scheduling in a Multi-Robot Environment

Tanmoy Kundu, Indranil Saha

In many multi-robot applications, mobile worker robots are often engaged in performing some tasks repetitively by following pre-computed trajectories. As these robots are battery-powered, they need to get recharged at regular intervals. We envision that in the future, a few mobile recharger robots will be employed to supply charge to the energy-deficient worker robots recurrently, to keep the overall efficiency of the system optimized.In this setup, we need to find the time instants and locations for the meeting of the worker robots and recharger robots optimally. We present a Satisfiability Modulo Theory (SMT)-based approach that captures the activities of the robots in the form of constraints in a sufficiently long finite-length time window (hypercycle) whose repetitions provide their perpetual behavior. Our SMT encoding ensures that for a chosen length of the hypercycle, the total waiting time of the worker robots due to charge constraints is minimized under certain condition, and close to optimal when the condition does not hold. Moreover, the recharger robots follow the most energy-efficient trajectories. We show the efficacy of our approach by comparing it with another variant of the SMT-based method which is not scalable but provides an optimal solution globally, and with a greedy algorithm.