LGFeb 16, 2023

Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK

Xiongtao Zhang, Zezong Yin, Yunliang Jiang, Yizhang Jiang, Danfeng Sun, Yong Liu

arXiv:2302.08038v12.01 citationsh-index: 73

Originality Incremental advance

AI Analysis

This work addresses the problem of balancing accuracy and interpretability in fuzzy systems for applications like medical diagnosis, though it is incremental as it adapts knowledge distillation to a specific fuzzy classifier context.

The paper tackles the trade-off between performance and interpretability in fuzzy classifiers by proposing HTSK-LLM-DKD, a method that distills knowledge from high-order to low-order TSK classifiers, resulting in enhanced performance comparable to high-order models while maintaining high interpretability, as verified on UCI and Cleveland heart disease datasets.

High-order Takagi-Sugeno-Kang (TSK) fuzzy classifiers possess powerful classification performance yet have fewer fuzzy rules, but always be impaired by its exponential growth training time and poorer interpretability owing to High-order polynomial used in consequent part of fuzzy rule, while Low-order TSK fuzzy classifiers run quickly with high interpretability, however they usually require more fuzzy rules and perform relatively not very well. Address this issue, a novel TSK fuzzy classifier embeded with knowledge distillation in deep learning called HTSK-LLM-DKD is proposed in this study. HTSK-LLM-DKD achieves the following distinctive characteristics: 1) It takes High-order TSK classifier as teacher model and Low-order TSK fuzzy classifier as student model, and leverages the proposed LLM-DKD (Least Learning Machine based Decoupling Knowledge Distillation) to distill the fuzzy dark knowledge from High-order TSK fuzzy classifier to Low-order TSK fuzzy classifier, which resulting in Low-order TSK fuzzy classifier endowed with enhanced performance surpassing or at least comparable to High-order TSK classifier, as well as high interpretability; specifically 2) The Negative Euclidean distance between the output of teacher model and each class is employed to obtain the teacher logits, and then it compute teacher/student soft labels by the softmax function with distillating temperature parameter; 3) By reformulating the Kullback-Leibler divergence, it decouples fuzzy dark knowledge into target class knowledge and non-target class knowledge, and transfers them to student model. The advantages of HTSK-LLM-DKD are verified on the benchmarking UCI datasets and a real dataset Cleveland heart disease, in terms of classification performance and model interpretability.

View on arXiv PDF

Similar