LGAIJun 22, 2025

On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization

arXiv:2507.01039v21 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses improving stability and convergence for explainable neuro-fuzzy agents in reinforcement learning, but it is incremental as it applies an existing method (PPO) to a specific controller type.

The paper tackled training neuro-fuzzy controllers for reinforcement learning by using Proximal Policy Optimization (PPO) instead of Deep Q-Networks (DQN), resulting in agents achieving the maximum return of 500 with zero variance on CartPole-v1 after 20000 updates.

We present a reinforcement learning method for training neuro-fuzzy controllers using Proximal Policy Optimization (PPO). Unlike prior approaches that used Deep Q-Networks (DQN) with Adaptive Neuro-Fuzzy Inference Systems (ANFIS), our PPO-based framework leverages a stable on-policy actor-critic setup. Evaluated on the CartPole-v1 environment across multiple seeds, PPO-trained fuzzy agents consistently achieved the maximum return of 500 with zero variance after 20000 updates, outperforming ANFIS-DQN baselines in both stability and convergence speed. This highlights PPO's potential for training explainable neuro-fuzzy agents in reinforcement learning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes