LG AINov 25, 2020

Backpropagation-Free Learning Method for Correlated Fuzzy Neural Networks

Armin Salimi-Badr, Mohammad Mehdi Ebadzadeh

arXiv:2012.01935v25.827 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of vanishing gradients and local optima in training fuzzy neural networks, which is significant for researchers and practitioners working with these models.

This paper introduces a backpropagation-free learning method for correlated fuzzy neural networks, which estimates desired premise part outputs via a constrained optimization problem. This approach successfully avoids vanishing gradients and local optima, demonstrating superior performance and a more parsimonious structure compared to other methods in time-series prediction and regression.

In this paper, a novel stepwise learning approach based on estimating desired premise parts' outputs by solving a constrained optimization problem is proposed. This learning approach does not require backpropagating the output error to learn the premise parts' parameters. Instead, the near best output values of the rules premise parts are estimated and their parameters are changed to reduce the error between current premise parts' outputs and the estimated desired ones. Therefore, the proposed learning method avoids error backpropagation, which lead to vanishing gradient and consequently getting stuck in a local optimum. The proposed method does not need any initialization method. This learning method is utilized to train a new Takagi-Sugeno-Kang (TSK) Fuzzy Neural Network with correlated fuzzy rules including many parameters in both premise and consequent parts, avoiding getting stuck in a local optimum due to vanishing gradient. To learn the proposed network parameters, first, a constrained optimization problem is introduced and solved to estimate the desired values of premise parts' output values. Next, the error between these values and the current ones is utilized to adapt the premise parts' parameters based on the gradient-descent (GD) approach. Afterward, the error between the desired and network's outputs is used to learn consequent parts' parameters by the GD method. The proposed paradigm is successfully applied to real-world time-series prediction and regression problems. According to experimental results, its performance outperforms other methods with a more parsimonious structure.

View on arXiv PDF

Similar