LGMay 2, 2022
Exploration in Deep Reinforcement Learning: A SurveyPawel Ladosz, Lilian Weng, Minwoo Kim et al.
This paper reviews exploration techniques in deep reinforcement learning. Exploration techniques are of primary importance when solving sparse reward problems. In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly. In such a scenario, it is challenging for reinforcement learning to learn rewards and actions association. Thus more sophisticated exploration methods need to be devised. This review provides a comprehensive overview of existing exploration approaches, which are categorized based on the key contributions as follows reward novel states, reward diverse behaviours, goal-based methods, probabilistic methods, imitation-based methods, safe exploration and random-based methods. Then, the unsolved challenges are discussed to provide valuable future research directions. Finally, the approaches of different categories are compared in terms of complexity, computational effort and overall performance.
LGJan 21, 2023
The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learningAndrea Soltoggio, Eseoghene Ben-Iwhiwhu, Christos Peridis et al.
This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multiple-task generation, (5) variable problem complexity. The environment provides 1D or 2D categorical observations, and takes actions as input. The core structure of the CT-graph is a multi-branch tree graph with arbitrary branching factor, depth, and observation sets that can be varied to increase the dimensions of the problem in a controllable and measurable way. Two main categories of states, decision states and wait states, are devised to create a hierarchy of importance among observations, typical of real-world problems. A large observation set can produce a vast set of histories that impairs memory-augmented agents. Variable reward functions allow for the easy creation of multiple tasks and the ability of an agent to efficiently adapt in dynamic scenarios where tasks with controllable degrees of similarities are presented. Challenging complexity levels can be easily achieved due to the exponential growth of the graph. The problem formulation and accompanying code provide a fast, transparent, and mathematically defined set of configurable tests to compare the performance of reinforcement learning algorithms, in particular in lifelong learning settings.
CVApr 23
FLARE-BO: Fused Luminance and Adaptive Retinex Enhancement via Bayesian Optimisation for Low-Light Robotic VisionNathan Shankar, Pawel Ladosz, Hujun Yin
Reliable visual perception under low illumination remains a core challenge for autonomous robotic systems, where degraded image quality directly compromises navigation, inspection, and various operations. A recent training free approach showed that Bayesian optimisation with Gaussian Processes can adaptively select brightness, contrast, and denoising parameters on a per-image basis, achieving competitive enhancement without any learned model. However, that framework is limited to three parameters, applies no illumination decomposition or white balance correction, and relies on Non-Local Means denoising, which tends to over smooth edges under noisy conditions. This paper proposes FLARE-BO (Fused Luminance and Adaptive Retinex Enhancement via Bayesian Optimisation), an extended framework that jointly optimises eight parameters spanning across gamma correction, LIME-style illumination normalisation, chrominance denoising, bilateral filtering, NLM denoising, Grey-World automatic white balance, and adaptive post smoothing. The search engine employs a unit hypercube parameter normalisation, objective standardisation, Sobol quasi-random initialisation, and Log Expected Improvement acquisition for principled exploration of the expanded space. Performance of the proposed method is benchmarked using the Low Light paired dataset (LOL) and results show marked improvements of the proposed method over existing methods that were not specifically trained using this dataset.
ROOct 6, 2025
CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared ImageryNathan Shankar, Pawel Ladosz, Hujun Yin
This paper presents a novel approach for enabling robust robotic perception in dark environments using infrared (IR) stream. IR stream is less susceptible to noise than RGB in low-light conditions. However, it is dominated by active emitter patterns that hinder high-level tasks such as object detection, tracking and localisation. To address this, a U-Net-based architecture is proposed that reconstructs clean IR images from emitter-populated input, improving both image quality and downstream robotic performance. This approach outperforms existing enhancement techniques and enables reliable operation of vision-driven robotic systems across illumination conditions from well-lit to extreme low-light scenes.
NEApr 27, 2020
Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP ProblemsEseoghene Ben-Iwhiwhu, Pawel Ladosz, Jeffery Dick et al.
Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.
LGSep 21, 2019
Deep Reinforcement Learning with Modulated Hebbian plus Q Network ArchitecturePawel Ladosz, Eseoghene Ben-Iwhiwhu, Jeffery Dick et al.
This paper presents a new neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to solve difficult partially observable Markov decision process (POMDP) problems which impair temporal difference (TD)-based RL algorithms such as DQN, as the TD error cannot be easily derived from observations. The key idea is to use a Hebbian network with bio-inspired neural traces in order to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low level features and control, while the MOHN contributes to the high-level decisions by associating rewards with past states and actions. Thus the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the MALMO environment show that the proposed algorithm improved DQN's results and even outperformed control tests with A2C, QRDQN+LSTM and REINFORCE algorithms on some POMDPs with confounding stimuli and sparse rewards.