SYAug 19, 2019
Sample Greedy Gossip for Distributed Network-Wide Average ComputationHyo-Sang Shin, Shaoming He, Antonios Tsourdos
This paper investigates the problem of distributed network-wide averaging and proposes a new greedy gossip algorithm. Instead of finding the optimal path of each node in a greedy manner, the proposed approach utilises a suboptimal communication path by performing greedy selection among randomly selected active local nodes. Theoretical analysis on convergence speed is also performed to investigate the characteristics of the proposed algorithm. The main feature of the new algorithm is that it provides great flexibility and well balance between communication cost and convergence performance introduced by the stochastic sampling strategy. Extensive numerical simulations are performed to validate the analytic findings.
LGJul 29, 2023
Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision ProcessesSaki Omi, Hyo-Sang Shin, Namhoon Cho et al.
Recent studies have greatly improved reinforcement learning, and an increased interest in real-world implementation has emerged. In many cases, the implementation is challenged by time-varying disturbances as it introduces hidden states, which makes the problem best described with Partially Observable Markov Decision Processes. An effective approach to address this problem is to introduce a Recurrent Neural Network (RNN) in place of a state estimator. However, only a few studies have investigated the types of information to be supplied to the RNN and the network architecture to handle them. This study discusses the effectiveness of the inclusion of action along with observation and the impact of network architecture to handle them by providing interpretations of how the trajectories are summarized at LSTM networks. Specifically, three novel approaches with different architectures are introduced. All algorithms demonstrated the effectiveness of the inclusion of the action trajectories in simulation environments. In particular, one of the developed algorithms, H-TD3, differs from the typical actor and critic network as the critic network is trained by utilizing the hidden states generated by the actor network as the summarized trajectory information. This novel approach exhibited the potential improvement of the computational time while maintaining the performance.
OCSep 8, 2022
Incremental Correction in Dynamic Systems Modelled with Neural Networks for Constraint SatisfactionNamhoon Cho, Hyo-Sang Shin, Antonios Tsourdos et al.
This study presents incremental correction methods for refining neural network parameters or control functions entering into a continuous-time dynamic system to achieve improved solution accuracy in satisfying the interim point constraints placed on the performance output variables. The proposed approach is to linearise the dynamics around the baseline values of its arguments, and then to solve for the corrective input required to transfer the perturbed trajectory to precisely known or desired values at specific time points, i.e., the interim points. Depending on the type of decision variables to adjust, parameter correction and control function correction methods are developed. These incremental correction methods can be utilised as a means to compensate for the prediction errors of pre-trained neural networks in real-time applications where high accuracy of the prediction of dynamical systems at prescribed time points is imperative. In this regard, the online update approach can be useful for enhancing overall targeting accuracy of finite-horizon control subject to point constraints using a neural policy. Numerical example demonstrates the effectiveness of the proposed approach in an application to a powered descent problem at Mars.
LGMar 5, 2022
Bayesian Learning Approach to Model Predictive ControlNamhoon Cho, Seokwon Lee, Hyo-Sang Shin et al.
This study presents a Bayesian learning perspective towards model predictive control algorithms. High-level frameworks have been developed separately in the earlier studies on Bayesian learning and sampling-based model predictive control. On one hand, the Bayesian learning rule provides a general framework capable of generating various machine learning algorithms as special instances. On the other hand, the dynamic mirror descent model predictive control framework is capable of diversifying sample-rollout-based control algorithms. However, connections between the two frameworks have still not been fully appreciated in the context of stochastic optimal control. This study combines the Bayesian learning rule point of view into the model predictive control setting by taking inspirations from the view of understanding model predictive controller as an online learner. The selection of posterior class and natural gradient approximation for the variational formulation governs diversification of model predictive control algorithms in the Bayesian learning approach to model predictive control. This alternative viewpoint complements the dynamic mirror descent framework through streamlining the explanation of design choices.
OCJun 20, 2023
A Passivity-Based Method for Accelerated Convex OptimisationNamhoon Cho, Hyo-Sang Shin
This study presents a constructive methodology for designing accelerated convex optimisation algorithms in continuous-time domain. The two key enablers are the classical concept of passivity in control theory and the time-dependent change of variables that maps the output of the internal dynamic system to the optimisation variables. The Lyapunov function associated with the optimisation dynamics is obtained as a natural consequence of specifying the internal dynamics that drives the state evolution as a passive linear time-invariant system. The passivity-based methodology provides a general framework that has the flexibility to generate convex optimisation algorithms with the guarantee of different convergence rate bounds on the objective function value. The same principle applies to the design of online parameter update algorithms for adaptive control by re-defining the output of internal dynamics to allow for the feedback interconnection with tracking error dynamics.
CVFeb 27, 2020Code
Target Detection, Tracking and Avoidance System for Low-cost UAVs using AI-Based ApproachesVinorth Varatharasan, Alice Shuang Shuang Rao, Eric Toutounji et al.
An onboard target detection, tracking and avoidance system has been developed in this paper, for low-cost UAV flight controllers using AI-Based approaches. The aim of the proposed system is that an ally UAV can either avoid or track an unexpected enemy UAV with a net to protect itself. In this point of view, a simple and robust target detection, tracking and avoidance system is designed. Two open-source tools were used for the aim: a state-of-the-art object detection technique called SSD and an API for MAVLink compatible systems called MAVSDK. The MAVSDK performs velocity control when a UAV is detected so that the manoeuvre is done simply and efficiently. The proposed system was verified with Software in the loop (SITL) and Hardware in the loop (HITL) simulators. The simplicity of this algorithm makes it innovative, and therefore it should be used in future applications needing robust performances with low-cost hardware such as delivery drone applications.
LGDec 17, 2023
Automatic Optimisation of Normalised Neural NetworksNamhoon Cho, Hyo-Sang Shin
We propose automatic optimisation methods considering the geometry of matrix manifold for the normalised parameters of neural networks. Layerwise weight normalisation with respect to Frobenius norm is utilised to bound the Lipschitz constant and to enhance gradient reliability so that the trained networks are suitable for control applications. Our approach first initialises the network and normalises the data with respect to the $\ell^{2}$-$\ell^{2}$ gain of the initialised network. Then, the proposed algorithms take the update structure based on the exponential map on high-dimensional spheres. Given an update direction such as that of the negative Riemannian gradient, we propose two different ways to determine the stepsize for descent. The first algorithm utilises automatic differentiation of the objective function along the update curve defined on the combined manifold of spheres. The directional second-order derivative information can be utilised without requiring explicit construction of the Hessian. The second algorithm utilises the majorisation-minimisation framework via architecture-aware majorisation for neural networks. With these new developments, the proposed methods avoid manual tuning and scheduling of the learning rate, thus providing an automated pipeline for optimizing normalised neural networks.
LGJan 17, 2022
Optimisation of Structured Neural Controller Based on Continuous-Time Policy GradientNamhoon Cho, Hyo-Sang Shin
This study presents a policy optimisation framework for structured nonlinear control of continuous-time (deterministic) dynamic systems. The proposed approach prescribes a structure for the controller based on relevant scientific knowledge (such as Lyapunov stability theory or domain experiences) while considering the tunable elements inside the given structure as the point of parametrisation with neural networks. To optimise a cost represented as a function of the neural network weights, the proposed approach utilises the continuous-time policy gradient method based on adjoint sensitivity analysis as a means for correct and performant computation of cost gradient. This enables combining the stability, robustness, and physical interpretability of an analytically-derived structure for the feedback controller with the representational flexibility and optimised resulting performance provided by machine learning techniques. Such a hybrid paradigm for fixed-structure control synthesis is particularly useful for optimising adaptive nonlinear controllers to achieve improved performance in online operation, an area where the existing theory prevails the design of structure while lacking clear analytical understandings about tuning of the gains and the uncertainty model basis functions that govern the performance characteristics. Numerical experiments on aerospace applications illustrate the utility of the structured nonlinear controller optimisation framework.
ITOct 25, 2021
Variational Probabilistic Multi-Hypothesis TrackingShuoyuan Xu, Hyo-Sang Shin, Antonios Tsourdos
This paper proposes a novel multi-target tracking (MTT) algorithm for scenarios with arbitrary numbers of measurements per target. We propose the variational probabilistic multi-hypothesis tracking (VPMHT) algorithm based on the variational Bayesian expectation-maximisation (VBEM) algorithm to resolve the MTT problem in the classic PMHT algorithm. With the introduction of variational inference, the proposed VPMHT handles track-loss much better than the conventional probabilistic multi-hypothesis tracking (PMHT) while preserving a similar or even better tracking accuracy. Extensive numerical simulations are conducted to demonstrate the effectiveness of the proposed algorithm.
LGMar 9, 2021
A Learning-Based Computational Impact Time GuidanceZichao Liu, Jiang Wang, Shaoming He et al.
This paper investigates the problem of impact-time-control and proposes a learning-based computational guidance algorithm to solve this problem. The proposed guidance algorithm is developed based on a general prediction-correction concept: the exact time-to-go under proportional navigation guidance with realistic aerodynamic characteristics is estimated by a deep neural network and a biased command to nullify the impact time error is developed by utilizing the emerging reinforcement learning techniques. The deep neural network is augmented into the reinforcement learning block to resolve the issue of sparse reward that has been observed in typical reinforcement learning formulation. Extensive numerical simulations are conducted to support the proposed algorithm.
CVFeb 27, 2020
Improving Learning Effectiveness For Object Detection and Classification in Cluttered BackgroundsVinorth Varatharasan, Hyo-Sang Shin, Antonios Tsourdos et al.
Usually, Neural Networks models are trained with a large dataset of images in homogeneous backgrounds. The issue is that the performance of the network models trained could be significantly degraded in a complex and heterogeneous environment. To mitigate the issue, this paper develops a framework that permits to autonomously generate a training dataset in heterogeneous cluttered backgrounds. It is clear that the learning effectiveness of the proposed framework should be improved in complex and heterogeneous environments, compared with the ones with the typical dataset. In our framework, a state-of-the-art image segmentation technique called DeepLab is used to extract objects of interest from a picture and Chroma-key technique is then used to merge the extracted objects of interest into specific heterogeneous backgrounds. The performance of the proposed framework is investigated through empirical tests and compared with that of the model trained with the COCO dataset. The results show that the proposed framework outperforms the model compared. This implies that the learning effectiveness of the framework developed is superior to the models with the typical dataset.
AIAug 19, 2019
A Domain-Knowledge-Aided Deep Reinforcement Learning Approach for Flight Control DesignHyo-Sang Shin, Shaoming He, Antonios Tsourdos
This paper aims to examine the potential of using the emerging deep reinforcement learning techniques in flight control. Instead of learning from scratch, we suggest to leverage domain knowledge available in learning to improve learning efficiency and generalisability. More specifically, the proposed approach fixes the autopilot structure as typical three-loop autopilot and deep reinforcement learning is utilised to learn the autopilot gains. To solve the flight control problem, we then formulate a Markovian decision process with a proper reward function that enable the application of reinforcement learning theory. Another type of domain knowledge is exploited for defining the reward function, by shaping reference inputs in consideration of important control objectives and using the shaped reference inputs in the reward function. The state-of-the-art deep deterministic policy gradient algorithm is utilised to learn an action policy that maps the observed states to the autopilot gains. Extensive empirical numerical simulations are performed to validate the proposed computational control algorithm.
MAJan 10, 2019
Sample Greedy Based Task Allocation for Multiple Robot SystemsHyo-Sang Shin, Teng Li, Pau Segui-Gasco
This paper addresses the task allocation problem for multi-robot systems. The main issue with the task allocation problem is inherent complexity that makes finding an optimal solution within a reasonable time almost impossible. To hand the issue, this paper develops a task allocation algorithm that can be decentralised by leveraging the submodularity concepts and sampling process. The theoretical analysis reveals that the proposed algorithm can provide approximation guarantee of $1/2$ for the monotone submodular case and $1/4$ for the non-monotone submodular case in average sense with polynomial time complexity. To examine the performance of the proposed algorithm and validate the theoretical analysis results, we design a task allocation problem and perform numerical simulations. The simulation results confirm that the proposed algorithm achieves solution quality, which is comparable to a state-of-the-art algorithm in the monotone case, and much better quality in the non-monotone case with significantly less computational complexity.
MANov 18, 2017
Anonymous Hedonic Game for Task Allocation in a Large-Scale Multiple Agent SystemInmo Jang, Hyo-Sang Shin, Antonios Tsourdos
This paper proposes a novel game-theoretical autonomous decision-making framework to address a task allocation problem for a swarm of multiple agents. We consider cooperation of self-interested agents, and show that our proposed decentralized algorithm guarantees convergence of agents with social inhibition to a Nash stable partition (i.e., social agreement) within polynomial time. The algorithm is simple and executable based on local interactions with neighbor agents under a strongly-connected communication network and even in asynchronous environments. We analytically present a mathematical formulation for computing the lower bound of suboptimality of the solution, and additionally show that 50% of suboptimality can be at least guaranteed if social utilities are non-decreasing functions with respect to the number of co-working agents. The results of numerical experiments confirm that the proposed framework is scalable, fast adaptable against dynamical environments, and robust even in a realistic situation.