Autoregressive Score Matching
This work addresses a bottleneck in flexible probabilistic modeling for machine learning researchers, offering a more scalable and stable alternative to existing score matching methods.
The paper tackles the constraint of normalized conditionals in autoregressive models by proposing autoregressive conditional score models (AR-CSM) that use unnormalized scores, introducing Composite Score Matching (CSM) for efficient training without sampling or adversarial methods, and demonstrates applications in density estimation, image generation, denoising, and latent variable models with improved scalability and stability.
Autoregressive models use chain rule to define a joint probability distribution as a product of conditionals. These conditionals need to be normalized, imposing constraints on the functional families that can be used. To increase flexibility, we propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariate log-conditionals (scores), which need not be normalized. To train AR-CSM, we introduce a new divergence between distributions named Composite Score Matching (CSM). For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. Compared to previous score matching algorithms, our method is more scalable to high dimensional data and more stable to optimize. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.