SYJun 1
Terminal Time and Angle-Constrained Nonlinear Intercept GuidanceShivam Bajpai, Abhinav Sinha
This paper considers the problem of simultaneously controlling an interceptor's impact time and impact angle using its lateral acceleration as the sole control input. With a single control input, the nonlinear engagement kinematics is inherently underactuated, which complicates guidance law synthesis. To overcome this challenge, a hierarchical sliding mode-based guidance law is developed to concurrently regulate the two terminal constraints. The proposed architecture consists of a two-layer sliding manifold. The first layer comprises two sub-sliding surfaces corresponding to the impact time and impact angle error dynamics, respectively, while the second layer introduces a composite sliding manifold that combines the two individual sub-surfaces. Then, a variable-gain adaptive guidance law is designed to ensure time and angle-constrained interception against a stationary target, which is further extended to intercept a constant velocity target. Simulations are conducted for various engagement scenarios to attest to the efficacy of the proposed approach.
OCMar 18, 2018
A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric informationDeepanshu Vasal, Abhinav Sinha, Achilleas Anastasopoulos
We consider finite-horizon and infinite-horizon versions of a dynamic game with $N$ selfish players who observe their types privately and take actions that are publicly observed. Players' types evolve as conditionally independent Markov processes, conditioned on their current actions. Their actions and types jointly determine their instantaneous rewards. In dynamic games with asymmetric information, a widely used concept of equilibrium is perfect Bayesian equilibrium (PBE), which consists of a strategy and belief pair that simultaneously satisfy sequential rationality and belief consistency. In general, there does not exist a universal algorithm that decouples the interdependence of strategies and beliefs over time in calculating PBE. In this paper, for the finite-horizon game with independent types we develop a two-step backward-forward recursive algorithm that sequentially decomposes the problem (w.r.t. time) to obtain a subset of PBEs, which we refer to as structured Bayesian perfect equilibria (SPBE). In such equilibria, a player's strategy depends on its history only through a common public belief and its current private type. The backward recursive part of this algorithm defines an equilibrium generating function. Each period in the backward recursion involves solving a fixed-point equation on the space of probability simplexes for every possible belief on types. Using this function, equilibrium strategies and beliefs are generated through a forward recursion. We then extend this methodology to the infinite-horizon model, where we propose a time-invariant single-shot fixed-point equation, which in conjunction with a forward recursive step, generates the SPBE. Sufficient conditions for the existence of SPBE are provided. With our proposed method, we find equilibria that exhibit signaling behavior. This is illustrated with the help of a concrete public goods example.
SYMay 1, 2018
Control of a nonlinear continuous stirred tank reactor via event triggered sliding modesAbhinav Sinha, Rajiv Kumar Mishra
Continuous Stirred Tank Reactors (CSTR) are the most important and central equipment in many chemical and biochemical industry that exhibit second order complex nonlinear dynamics. The nonlinear dynamics of CSTR poses many design and control challenges. The proposed controller guarantees a stable closed loop behavior over multiple operating points even in presence of disturbances and parametric uncertainties. An event driven sliding mode control is presented in this work to regulate the temperature and concentration very close to the equilibrium points of CSTR. The control is executed only when a predefined condition gets violated and hence the controller is relaxed when the system is operating under tolerable limits in terms of closed loop performance. A novel dynamic event triggering rule is presented to maintain desired performance with minimum computational cost. The inter event execution time is shown to be lower bounded by a finite positive quantity to exclude Zeno behavior. Sliding mode control (SMC) combined with event triggering scheme retains the inherent robustness of traditional SMC and aids in reducing computational load on the controller involved. Simulation results validate the efficiency of the proposed controller.
SYOct 11, 2011
Optimal Power Allocation for Renewable Energy SourceAbhinav Sinha, Prasanna Chaporkar
Battery powered transmitters face energy constraint, replenishing their energy by a renewable energy source (like solar or wind power) can lead to longer lifetime. We consider here the problem of finding the optimal power allocation under random channel conditions for a wireless transmitter, such that rate of information transfer is maximized. Here a rechargeable battery, which is periodically charged by renewable source, is used to power the transmitter. All of above is formulated as a Markov Decision Process. Structural properties like the monotonicity of the optimal value and policy derived in this paper will be of vital importance in understanding the kind of algorithms and approximations needed in real-life scenarios. The effect of curse of dimensionality which is prevalent in Dynamic programming problems can thus be reduced. We show our results under the most general of assumptions.
LGJun 16, 2023
Fairness in Preference-based Reinforcement LearningUmer Siddique, Abhinav Sinha, Yongcan Cao
In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in the presence of multiple objectives. The main objective is to design control policies that can optimize multiple objectives while treating each objective fairly. Toward this objective, we design a new fairness-induced preference-based reinforcement learning or FPbRL. The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences rather than reward-based preference in PbRL, coupled with policy learning via maximizing a generalized Gini welfare function. Finally, we provide experiment studies on three different environments to show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.
MAMay 25
Collaborative Threat-Aware Autonomy (CTAA)Rajnikant Sharma, Abhinav Sinha, Isaac Weintraub
Navigating teams of unmanned vehicles through environments containing dynamic, adversarial Weapon Engagement Zones~(WEZs) poses a fundamental challenge to mission success: a single vehicle, however capable its onboard guidance, remains a single point of failure. This paper presents a role-differentiated multi-agent framework for collaborative threat-aware trajectory planning in which a fleet of Autonomous Collaborative Platforms~(ACPs) is assigned distinct roles primary intercept, escort, and decoy to improve team-level mission success probability while managing individual WEZ exposure. Each ACP independently employs a reactive guidance law derived from the Collision Sphere Boundary for Evader Zero-Set~(CSBEZ), which accounts for pursuer maneuverability constraints imposed by minimum turn radius, and steers the vehicle toward the safest heading that also makes progress toward its goal. Role assignment and spatial route separation induce two complementary effects: probabilistic redundancy, in which $N$ independent paths raise the team success probability and threat saturation, in which lower-priority escorts and decoys draw adversary attention and free the primary vehicle to transit uncontested.
LGSep 29, 2024
Adaptive Event-triggered Reinforcement Learning Control for Complex Nonlinear SystemsUmer Siddique, Abhinav Sinha, Yongcan Cao
In this paper, we propose an adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions. Specifically, the proposed method is capable of jointly learning both the control policy and the communication policy, thereby reducing the number of parameters and computational overhead when learning them separately or only one of them. By augmenting the state space with accrued rewards that represent the performance over the entire trajectory, we show that accurate and efficient determination of triggering conditions is possible without the need for explicit learning triggering conditions, thereby leading to an adaptive non-stationary policy. Finally, we provide several numerical examples to demonstrate the effectiveness of the proposed approach.
SYDec 5, 2017
Consensus tracking in multi agent system with nonlinear and non identical dynamics via event driven sliding modesAbhinav Sinha, Rajiv Kumar Mishra
In this work, leader follower consensus objective has been addressed with the synthesis of an event based controller utilizing sliding mode robust control. The schema has been partitioned into two parts viz. finite time consensus problem and event triggered control mechanism. A nonlinear multi agent system with non identical dynamics has been put forward to illustrate the robust capabilities of the proposed control. The first part incorporates matching of states of the followers with those of the leader via consensus tracking algorithm. In the subsequent part, an event triggered rule is devised to save computational power and restrict periodic updating of the controller involved while ensuring desired closed loop performance of the system. Switching of the event based controller is achieved via sliding mode control. Advantage of using switched controller like sliding mode is that it retains its inherent robustness as well as event triggering approach aids in saving energy expenditure. Efficacy of the proposed scheme is confirmed via numerical simulations.
SYMar 24
Engagement-Zone-Aware Input-Constrained Guidance for Safe Target Interception in Contested EnvironmentsPraveen Kumar Ranjan, Abhinav Sinha, Yongcan Cao
We address target interception in contested environments in the presence of multiple defenders whose interception capability is limited by finite ranges. Conventional methods typically impose conservative stand-off constraints based on maximum engagement distance and neglect the interceptors' actuator limitations. Instead, we formulate safety constraints using defender-induced engagement zones. To account for actuator limits, the vehicle model is augmented with input saturation dynamics. A time-varying safe-set tightening parameter is introduced to compensate for transient constraint violations induced by actuator dynamics. To ensure scalable safety enforcement in multi-defender scenarios, a smooth aggregate safety function is constructed using a log-sum-exp operator combining individual threat measures associated with each defender's capability. A smooth switching guidance strategy is then developed to coordinate interception and safety objectives. The attacker pursues the target when sufficiently distant from threat boundaries and progressively activates evasive motion as the EZ boundaries are approached. The resulting controller relies only on relative measurements and does not require knowledge of defender control inputs, thus facilitating a fully distributed and scalable implementation. Rigorous analysis provides sufficient conditions guaranteeing target interception, practical safety with respect to all defender engagement zones, and satisfaction of actuator bounds. An input-constrained guidance law based on conservative stand-off distance is also developed to quantify the conservatism of maximum-range-based safety formulations. Simulations with stationary and maneuvering defenders demonstrate that the proposed formulation yields shorter interception paths and reduced interception time compared with conventional methods while maintaining safety throughout the engagement.
SYMay 13
Bounded-Input True Proportional Navigation for Impact-Time ControlLohitvel Gopikannan, Shashi Ranjan Kumar, Abhinav Sinha
This paper proposes a nonlinear guidance strategy capable of intercepting a constant-velocity, non-maneuvering target while strictly satisfying the prescribed bounds on the control input (commanded acceleration). Unlike conventional strategies that estimate time-to-go using linearization or small-angle approximations, the proposed strategy employs true proportional-navigation guidance (TPNG) as a baseline, which utilizes an exact time-to-go formulation and is applicable over a wide range of target motions. In contrast to most existing strategies, which do not incorporate control input bounds into the guidance design, the proposed approach explicitly accounts for these limits by modeling the interceptor acceleration as a dynamic variable. Based on the sliding mode control technique, an effective guidance law that achieves time-constrained interception while accounting for bounded input is then derived. The performance of the proposed strategy is evaluated for various engagement scenarios.
SYSep 24, 2025
Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement LearningUmer Siddique, Abhinav Sinha, Yongcan Cao
Conventional multi-agent reinforcement learning (MARL) methods rely on time-triggered execution, where agents sample and communicate actions at fixed intervals. This approach is often computationally expensive and communication-intensive. To address this limitation, we propose ET-MAPG (Event-Triggered Multi-Agent Policy Gradient reinforcement learning), a framework that jointly learns an agent's control policy and its event-triggering policy. Unlike prior work that decouples these mechanisms, ET-MAPG integrates them into a unified learning process, enabling agents to learn not only what action to take but also when to execute it. For scenarios with inter-agent communication, we introduce AET-MAPG, an attention-based variant that leverages a self-attention mechanism to learn selective communication patterns. AET-MAPG empowers agents to determine not only when to trigger an action but also with whom to communicate and what information to exchange, thereby optimizing coordination. Both methods can be integrated with any policy gradient MARL algorithm. Extensive experiments across diverse MARL benchmarks demonstrate that our approaches achieve performance comparable to state-of-the-art, time-triggered baselines while significantly reducing both computational load and communication overhead.
SYJun 3, 2021
Three-agent Time-constrained Cooperative Pursuit-EvasionAbhinav Sinha, Shashi Ranjan Kumar, Dwaipayan Mukherjee
This paper considers a pursuit-evasion scenario among three agents -- an evader, a pursuer, and a defender. We design cooperative guidance laws for the evader and the defender team to safeguard the evader from an attacking pursuer. Unlike differential games, optimal control formulations, and other heuristic methods, we propose a novel perspective on designing effective nonlinear feedback control laws for the evader-defender team using a time-constrained guidance approach. The evader lures the pursuer on the collision course by offering itself as bait. At the same time, the defender protects the evader from the pursuer by exercising control over the engagement duration. Depending on the nature of the mission, the defender may choose to take an aggressive or defensive stance. Such consideration widens the applicability of the proposed methods in various three-agent motion planning scenarios such as aircraft defense, asset guarding, search and rescue, surveillance, and secure transportation. We use a fixed-time sliding mode control strategy to design the control laws for the evader-defender team and a nonlinear finite-time disturbance observer to estimate the pursuer's maneuver. Finally, we present simulations to demonstrate favorable performance under various engagement geometries, thus vindicating the efficacy of the proposed designs.
SYNov 10, 2017
Cooperative control of multi-agent systems to locate source of an odorAbhinav Sinha, Rishemjit Kaur, Ritesh Kumar et al.
This work targets the problem of odor source localization by multi-agent systems. A hierarchical cooperative control has been put forward to solve the problem of locating source of an odor by driving the agents in consensus when at least one agent obtains information about location of the source. Synthesis of the proposed controller has been carried out in a hierarchical manner of group decision making, path planning and control. Decision making utilizes information of the agents using conventional Particle Swarm Algorithm and information of the movement of filaments to predict the location of the odor source. The predicted source location in the decision level is then utilized to map a trajectory and pass that information to the control level. The distributed control layer uses sliding mode controllers known for their inherent robustness and the ability to reject matched disturbances completely. Two cases of movement of agents towards the source, i.e., under consensus and formation have been discussed herein. Finally, numerical simulations demonstrate the efficacy of the proposed hierarchical distributed control.
LGMay 23, 2014
Online Linear Optimization via SmoothingJacob Abernethy, Chansoo Lee, Abhinav Sinha et al.
We present a new optimization-theoretic approach to analyzing Follow-the-Leader style algorithms, particularly in the setting where perturbations are used as a tool for regularization. We show that adding a strongly convex penalty function to the decision rule and adding stochastic perturbations to data correspond to deterministic and stochastic smoothing operations, respectively. We establish an equivalence between "Follow the Regularized Leader" and "Follow the Perturbed Leader" up to the smoothness properties. This intuition leads to a new generic analysis framework that recovers and improves the previous known regret bounds of the class of algorithms commonly known as Follow the Perturbed Leader.