Single-Core Superscalar Optimization of Clifford Neural Layers
This work addresses performance bottlenecks for researchers and practitioners using Clifford neural networks in physical sciences, though it is incremental as it builds on existing methods.
The paper tackled the problem of slow inference in Clifford neural layers by analyzing their computational structure and applying optimizations, resulting in an average speedup of 21.35x over a baseline implementation while maintaining correctness.
Within the growing interest in the physical sciences in developing networks with equivariance properties, Clifford neural layers shine as one approach that delivers $E(n)$ and $O(n)$ equivariances given specific group actions. In this paper, we analyze the inner structure of the computation within Clifford convolutional layers and propose and implement several optimizations to speed up the inference process while maintaining correctness. In particular, we begin by analyzing the theoretical foundations of Clifford algebras to eliminate redundant matrix allocations and computations, then systematically apply established optimization techniques to enhance performance further. We report a final average speedup of 21.35x over the baseline implementation of eleven functions and runtimes comparable to and faster than the original PyTorch implementation in six cases. In the remaining cases, we achieve performance in the same order of magnitude as the original library.