H. Levent Akın

h-index16

3papers

37citations

Novelty25%

AI Score20

Ranked #185,311 of 194,257 authors (top 95%)#11,647 in AI (top 93%)

3 Papers

6.0LGSep 3, 2019

Generalization in Transfer Learning

Suzan Ece Ada, Emre Ugur, H. Levent Akin

Agents trained with deep reinforcement learning algorithms are capable of performing highly complex tasks including locomotion in continuous environments. We investigate transferring the learning acquired in one task to a set of previously unseen tasks. Generalization and overfitting in deep reinforcement learning are not commonly addressed in current transfer learning research. Conducting a comparative analysis without an intermediate regularization step results in underperforming benchmarks and inaccurate algorithm comparisons due to rudimentary assessments. In this study, we propose regularization techniques in deep reinforcement learning for continuous control through the application of sample elimination, early stopping and maximum entropy regularized adversarial learning. First, the importance of the inclusion of training iteration number to the hyperparameters in deep transfer reinforcement learning will be discussed. Because source task performance is not indicative of the generalization capacity of the algorithm, we start by acknowledging the training iteration number as a hyperparameter. In line with this, we introduce an additional step of resorting to earlier snapshots of policy parameters to prevent overfitting to the source task. Then, to generate robust policies, we discard the samples that lead to overfitting via a method we call strict clipping. Furthermore, we increase the generalization capacity in widely used transfer learning benchmarks by using maximum entropy regularization, different critic methods, and curriculum learning in an adversarial setup. Subsequently, we propose maximum entropy adversarial reinforcement learning to increase the domain randomization. Finally, we evaluate the robustness of these methods on simulated robots in target environments where the morphology of the robot, gravity, and tangential friction coefficient of the environment are altered.

2.9ROJun 28, 2018

End-to-End Deep Imitation Learning: Robot Soccer Case Study

Okan Aşık, Binnur Görer, H. Levent Akın

In imitation learning, behavior learning is generally done using the features extracted from the demonstration data. Recent deep learning algorithms enable the development of machine learning methods that can get high dimensional data as an input. In this work, we use imitation learning to teach the robot to dribble the ball to the goal. We use B-Human robot software to collect demonstration data and a deep convolutional network to represent the policies. We use top and bottom camera images of the robot as input and speed commands as outputs. The CNN policy learns the mapping between the series of images and speed commands. In 3D realistic robotics simulator experiments, we show that the robot is able to learn to search the ball and dribble the ball, but it struggles to align to the goal. The best-proposed policy model learns to score 4 goals out of 20 test episodes.

2.5AIJun 4, 2016Code

Effective Multi-Robot Spatial Task Allocation using Model Approximations

Okan Aşık, H. Levent Akın

Real-world multi-agent planning problems cannot be solved using decision-theoretic planning methods due to the exponential complexity. We approximate firefighting in rescue simulation as a spatially distributed task and model with multi-agent Markov decision process. We use recent approximation methods for spatial task problems to reduce the model complexity. Our approximations are single-agent, static task, shortest path pruning, dynamic planning horizon, and task clustering. We create scenarios from RoboCup Rescue Simulation maps and evaluate our methods on these graph worlds. The results show that our approach is faster and better than comparable methods and has negligible performance loss compared to the optimal policy. We also show that our method has a similar performance as DCOP methods on example RCRS scenarios.