Insect-inspired modular architectures as inductive biases for reinforcement learning

arXiv:2604.2208127.6h-index: 15

Predicted impact top 76% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For RL practitioners in continuous control, this work demonstrates that biologically inspired modular architectures can improve performance on tasks with dynamically competing objectives, though the gains are demonstrated only on a single synthetic navigation task.

The paper introduces a modular RL policy architecture inspired by insect neural circuits, which decomposes control into specialized modules with a learned arbitration mechanism. On a 2D navigation task with competing objectives, the modular policy achieves a final episodic return of -2798.8±964.4, outperforming centralized GRU (-3778.0±628.1) and MLP (-4727.5±772.5) baselines.

Most reinforcement-learning (RL) controllers used in continuous control are architecturally centralized: observations are compressed into a single latent state from which both value estimates and actions are produced. Biological control systems are often organized differently. Insects, in particular, coordinate navigation, heading stabilization, memory, and context-dependent action selection through distributed circuits rather than a single monolithic controller. Motivated by this contrast, we study an RL policy architecture that decomposes control into interacting modules for sensory encoding, heading representation, sparse associative memory, recurrent command generation, and local motor control, with a learned arbitration mechanism that allocates motor authority across modules. The model is evaluated on a two-dimensional navigation task that require simultaneous food seeking, obstacle avoidance, and predator escape. In a six-seed predator-navigation experiment trained with Proximal Policy Optimization (PPO) for 75 updates, the modular policy achieves the strongest final mean performance among the tested controllers, with final episodic return $-2798.8\pm964.4$ versus $-3778.0\pm628.1$ for a centralized gated recurrent unit (GRU) and $-4727.5\pm772.5$ for a centralized multilayer perceptron (MLP). The modular policy also attains the lowest final value loss and stable PPO optimization statistics while driving module-assignment entropy to $0.0457\pm0.0244$, indicating highly selective control allocation. These results suggest that distributed control can serve as a useful inductive bias for RL problems involving dynamically competing behavioral objectives.

View on arXiv PDF

Similar