LGAIFeb 27, 2025

Unified Kernel-Segregated Transpose Convolution Operation

arXiv:2502.20493v1h-index: 14
Originality Incremental advance
AI Analysis

This work addresses performance bottlenecks in deep learning models using transpose convolutions, such as GANs, by reducing memory and computational overhead, though it is incremental as it builds on existing kernel segregation methods.

The paper tackles the inefficiency of kernel segregation in transpose convolution layers by introducing a unified kernel approach, achieving average computational speedups of 2.03x to 3.89x on specific hardware and up to 35 MB memory savings in an EB-GAN model.

The optimization of the transpose convolution layer for deep learning applications is achieved with the kernel segregation mechanism. However, kernel segregation has disadvantages, such as computing extra elements to obtain the output feature map with odd dimensions while launching a thread. To mitigate this problem, we introduce a unified kernel segregation approach that limits the usage of memory and computational resources by employing one unified kernel to execute four sub-kernels. The findings reveal that the suggested approach achieves an average computational speedup of 2.03x (3.89x) when tested on specific datasets with an RTX 2070 GPU (Intel Xeon CPU). The ablation study shows an average computational speedup of 3.5x when evaluating the transpose convolution layers from well-known Generative Adversarial Networks (GANs). The implementation of the proposed method for the transpose convolution layers in the EB-GAN model demonstrates significant memory savings of up to 35 MB.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes