Variational Autoencoded Multivariate Spatial Fay-Herriot Models
This addresses scalability issues for policymakers and researchers using small area estimation with high-dimensional data, though it is an incremental method combining existing techniques.
The paper tackles the computational inefficiency of multivariate spatial Fay-Herriot models for small area estimation by integrating them with variational autoencoders to learn spatial random effects, achieving extreme improvements in computational efficiency for massive datasets, as demonstrated on California census tract data.
Small area estimation models are essential for estimating population characteristics in regions with limited sample sizes, thereby supporting policy decisions, demographic studies, and resource allocation, among other use cases. The spatial Fay-Herriot model is one such approach that incorporates spatial dependence to improve estimation by borrowing strength from neighboring regions. However, this approach often requires substantial computational resources, limiting its scalability for high-dimensional datasets, especially when considering multiple (multivariate) responses. This paper proposes two methods that integrate the multivariate spatial Fay-Herriot model with spatial random effects, learned through variational autoencoders, to efficiently leverage spatial structure. Importantly, after training the variational autoencoder to represent spatial dependence for a given set of geographies, it may be used again in future modeling efforts, without the need for retraining. Additionally, the use of the variational autoencoder to represent spatial dependence results in extreme improvements in computational efficiency, even for massive datasets. We demonstrate the effectiveness of our approach using 5-year period estimates from the American Community Survey over all census tracts in California.