OCJan 24, 2013
Modelling and Control of Blowing-Venting Operations in Manned SubmarinesRoberto Font, Javier Garcia, Jose Alberto Murillo et al.
Motivated by the study of the potential use of blowing and venting operations of ballast tanks in manned submarines as a complementary or alternative control system for manoeuvring, we first propose a mathematical model for these operations. Then we consider the coupling of blowing and venting with the Feldman, variable mass, coefficient based hydrodynamic model for the equations of motion. The final complete model is composed of a system of twenty-four nonlinear ordinary differential equations. In a second part, we carry out a rigorous mathematical analysis of the model: existence of a solution is proved. As one of the possible applications of this model in naval engineering problems, we consider the problem of roll control in an emergency rising manoeuvre by using only blowing and venting. To this end, we formulate a suitable constrained, nonlinear, optimal control problem where controls are linked to the variable aperture of blowing and venting valves of each of the tanks. Existence of a solution for this problem is also proved. Finally, we address the numerical resolution of the control problem by using a descent algorithm. Numerical experiments seem to indicate that, indeed, an appropriate use of blowing and venting operations may help in the control of this emergency manoeuvre.
LGFeb 4, 2014
Safe Exploration of State and Action Spaces in Reinforcement LearningJavier Garcia, Fernando Fernandez
In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.