Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals
This work addresses efficiency and performance issues in deep neural networks for researchers and practitioners, but it is incremental as it builds on existing layer attention methods.
The paper tackles the problem of redundancy in layer attention mechanisms, where adjacent layers learn similar attention weights, reducing representational capacity and increasing training time. The proposed Efficient Layer Attention (ELA) architecture achieves a 30% reduction in training time while enhancing performance in tasks like image classification and object detection.
Growing evidence suggests that layer attention mechanisms, which enhance interaction among layers in deep neural networks, have significantly advanced network architectures. However, existing layer attention methods suffer from redundancy, as attention weights learned by adjacent layers often become highly similar. This redundancy causes multiple layers to extract nearly identical features, reducing the model's representational capacity and increasing training time. To address this issue, we propose a novel approach to quantify redundancy by leveraging the Kullback-Leibler (KL) divergence between adjacent layers. Additionally, we introduce an Enhanced Beta Quantile Mapping (EBQM) method that accurately identifies and skips redundant layers, thereby maintaining model stability. Our proposed Efficient Layer Attention (ELA) architecture, improves both training efficiency and overall performance, achieving a 30% reduction in training time while enhancing performance in tasks such as image classification and object detection.