GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning
This work addresses the problem of generating realistic instrument sounds for audio synthesis applications, representing an incremental improvement with novel conditioning techniques.
The paper tackles instrument sound synthesis by proposing GANStrument, a generative adversarial model that uses one-shot input to generate pitched sounds with high fidelity and diversity, outperforming baselines in quality and editability.
We propose GANStrument, a generative adversarial model for instrument sound synthesis. Given a one-shot sound as input, it is able to generate pitched instrument sounds that reflect the timbre of the input within an interactive time. By exploiting instance conditioning, GANStrument achieves better fidelity and diversity of synthesized sounds and generalization ability to various inputs. In addition, we introduce an adversarial training scheme for a pitch-invariant feature extractor that significantly improves the pitch accuracy and timbre consistency. Experimental results show that GANStrument outperforms strong baselines that do not use instance conditioning in terms of generation quality and input editability. Qualitative examples are available online.