AIApr 29, 2025

Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks

Tianqing Zhang, Zixin Zhu, Kairong Yu, Hongwei Wang

arXiv:2504.20445v27.82 citationsh-index: 4IJCNN

Originality Incremental advance

AI Analysis

This work addresses the problem of inefficient knowledge transfer in SNNs for researchers and practitioners in neuromorphic computing, offering an incremental improvement over existing methods.

The paper tackles the performance gap between Spiking Neural Networks (SNNs) and Artificial Neural Networks (ANNs) by proposing Head-Tail Aware KL divergence for knowledge distillation, achieving improved results on datasets like CIFAR-10, CIFAR-100, and Tiny ImageNet with fewer timesteps.

Spiking Neural Networks (SNNs) have emerged as a promising approach for energy-efficient and biologically plausible computation. However, due to limitations in existing training methods and inherent model constraints, SNNs often exhibit a performance gap when compared to Artificial Neural Networks (ANNs). Knowledge distillation (KD) has been explored as a technique to transfer knowledge from ANN teacher models to SNN student models to mitigate this gap. Traditional KD methods typically use Kullback-Leibler (KL) divergence to align output distributions. However, conventional KL-based approaches fail to fully exploit the unique characteristics of SNNs, as they tend to overemphasize high-probability predictions while neglecting low-probability ones, leading to suboptimal generalization. To address this, we propose Head-Tail Aware Kullback-Leibler (HTA-KL) divergence, a novel KD method for SNNs. HTA-KL introduces a cumulative probability-based mask to dynamically distinguish between high- and low-probability regions. It assigns adaptive weights to ensure balanced knowledge transfer, enhancing the overall performance. By integrating forward KL (FKL) and reverse KL (RKL) divergence, our method effectively align both head and tail regions of the distribution. We evaluate our methods on CIFAR-10, CIFAR-100 and Tiny ImageNet datasets. Our method outperforms existing methods on most datasets with fewer timesteps.

View on arXiv PDF

Similar