Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
This addresses efficiency and interpretability issues in structured language tasks for NLP researchers and practitioners, though it appears incremental as it modifies existing transformer architectures.
The paper tackles the quadratic complexity bottleneck of dot-product self-attention in sequence-to-sequence models by introducing the Graph Wavelet Transformer (GWT), which replaces it with a learnable multi-scale wavelet transform based on graph Laplacians from syntactic/semantic parses, achieving comparable performance with linear complexity.
Existing sequence to sequence models for structured language tasks rely heavily on the dot product self attention mechanism, which incurs quadratic complexity in both computation and memory for input length N. We introduce the Graph Wavelet Transformer (GWT), a novel architecture that replaces this bottleneck with a learnable, multi scale wavelet transform defined over an explicit graph Laplacian derived from syntactic or semantic parses. Our analysis shows that multi scale spectral decomposition offers an interpretable, efficient, and expressive alternative to quadratic self attention for graph structured sequence modeling.