Wave-PDE Nets: Trainable Wave-Equation Layers as an Alternative to Attention
This addresses the computational bottleneck in large-scale neural networks for researchers and practitioners, though it represents a novel method rather than a paradigm shift.
The paper tackles the computational inefficiency of Transformer architectures by introducing Wave-PDE Nets, which use trainable wave equation layers as an alternative to attention mechanisms. The approach matches or exceeds Transformer performance on language and vision benchmarks while reducing wall-clock time by up to 30% and peak memory by 25%.
We introduce Wave-PDE Nets, a neural architecture whose elementary operation is a differentiable simulation of the second-order wave equation. Each layer propagates its hidden state as a continuous field through a medium with trainable spatial velocity c(x) and damping γ(x). A symplectic spectral solver based on FFTs realises this propagation in O(nlog n) time. This oscillatory, global mechanism provides a powerful alternative to attention and first-order state-space models. We prove that a single Wave-PDE layer is a universal approximator. On language and vision benchmarks, Wave-PDE Nets match or exceed Transformer performance while demonstrating superior practical efficiency, reducing wall-clock time by up to 30% and peak memory by 25%. Ablation studies confirm the critical role of symplectic integration and a spectral Laplacian for stability and performance. Visualizations of the learned physical parameters reveal that the model learns intuitive strategies for information propagation. These results position Wave-PDE Nets as a computationally efficient and robust architecture with a strong physical inductive bias.