Humphrey Munn

RO
h-index2
3papers
3citations
Novelty43%
AI Score40

3 Papers

NEMay 13, 2022
Towards Understanding the Link Between Modularity and Performance in Neural Networks for Reinforcement Learning

Humphrey Munn, Marcus Gallagher

Modularity has been widely studied as a mechanism to improve the capabilities of neural networks through various techniques such as hand-crafted modular architectures and automatic approaches. While these methods have sometimes shown improvements towards generalisation ability, robustness, and efficiency, the mechanisms that enable modularity to give performance advantages are unclear. In this paper, we investigate this issue and find that the amount of network modularity for optimal performance is likely entangled in complex relationships between many other features of the network and problem environment. Therefore, direct optimisation or arbitrary designation of a suitable amount of modularity in neural networks may not be beneficial. We used a classic neuroevolutionary algorithm which enables rich, automatic optimisation and exploration of neural network architectures and weights with varying levels of modularity. The structural modularity and performance of networks generated by the NeuroEvolution of Augmenting Topologies algorithm was assessed on three reinforcement learning tasks, with and without an additional modularity objective. The results of the quality-diversity optimisation algorithm, MAP-Elites, suggest intricate conditional relationships between modularity, performance, and other predefined network features.

ROSep 18, 2025Code
Scalable Multi-Objective Robot Reinforcement Learning through Gradient Conflict Resolution

Humphrey Munn, Brendan Tidd, Peter Böhm et al.

Reinforcement Learning (RL) robot controllers usually aggregate many task objectives into one scalar reward. While large-scale proximal policy optimisation (PPO) has enabled impressive results such as robust robot locomotion in the real world, many tasks still require careful reward tuning and are brittle to local optima. Tuning cost and sub-optimality grow with the number of objectives, limiting scalability. Modelling reward vectors and their trade-offs can address these issues; however, multi-objective methods remain underused in RL for robotics because of computational cost and optimisation difficulty. In this work, we investigate the conflict between gradient contributions for each objective that emerge from scalarising the task objectives. In particular, we explicitly address the conflict between task-based rewards and terms that regularise the policy towards realistic behaviour. We propose GCR-PPO, a modification to actor-critic optimisation that decomposes the actor update into objective-wise gradients using a multi-headed critic and resolves conflicts based on the objective priority. Our methodology, GCR-PPO, is evaluated on the well-known IsaacLab manipulation and locomotion benchmarks and additional multi-objective modifications on two related tasks. We show superior scalability compared to parallel PPO (p = 0.04), without significant computational overhead. We also show higher performance with more conflicting tasks. GCR-PPO improves on large-scale PPO with an average improvement of 9.5%, with high-conflict tasks observing a greater improvement. The code is available at https://github.com/humphreymunn/GCR-PPO.

ROFeb 2
RAPT: Model-Predictive Out-of-Distribution Detection and Failure Diagnosis for Sim-to-Real Humanoid Robots

Humphrey Munn, Brendan Tidd, Peter Bohm et al.

Deploying learned control policies on humanoid robots is challenging: policies that appear robust in simulation can execute confidently in out-of-distribution (OOD) states after Sim-to-Real transfer, leading to silent failures that risk hardware damage. Although anomaly detection can mitigate these failures, prior methods are often incompatible with high-rate control, poorly calibrated at the extremely low false-positive rates required for practical deployment, or operate as black boxes that provide a binary stop signal without explaining why the robot drifted from nominal behavior. We present RAPT, a lightweight, self-supervised deployment-time monitor for 50Hz humanoid control. RAPT learns a probabilistic spatio-temporal manifold of nominal execution from simulation and evaluates execution-time predictive deviation as a calibrated, per-dimension signal. This yields (i) reliable online OOD detection under strict false-positive constraints and (ii) a continuous, interpretable measure of Sim-to-Real mismatch that can be tracked over time to quantify how far deployment has drifted from training. Beyond detection, we introduce an automated post-hoc root-cause analysis pipeline that combines gradient-based temporal saliency derived from RAPT's reconstruction objective with LLM-based reasoning conditioned on saliency and joint kinematics to produce semantic failure diagnoses in a zero-shot setting. We evaluate RAPT on a Unitree G1 humanoid across four complex tasks in simulation and on physical hardware. In large-scale simulation, RAPT improves True Positive Rate (TPR) by 37% over the strongest baseline at a fixed episode-level false positive rate of 0.5%. On real-world deployments, RAPT achieves a 12.5% TPR improvement and provides actionable interpretability, reaching 75% root-cause classification accuracy across 16 real-world failures using only proprioceptive data.