F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation
This work addresses efficiency issues in music generation for researchers and practitioners, though it appears incremental as it builds on existing kernel approximation techniques.
The paper tackles the quadratic complexity of structure-informed positional encoding in Transformers for symbolic music generation by proposing F-StrIPE, a linear-complexity scheme, and demonstrates its effectiveness in melody harmonization tasks.
While music remains a challenging domain for generative models like Transformers, recent progress has been made by exploiting suitable musically-informed priors. One technique to leverage information about musical structure in Transformers is inserting such knowledge into the positional encoding (PE) module. However, Transformers carry a quadratic cost in sequence length. In this paper, we propose F-StrIPE, a structure-informed PE scheme that works in linear complexity. Using existing kernel approximation techniques based on random features, we show that F-StrIPE is a generalization of Stochastic Positional Encoding (SPE). We illustrate the empirical merits of F-StrIPE using melody harmonization for symbolic music.