MLJan 9, 2023
A domain-decomposed VAE method for Bayesian inverse problemsZhihang Xu, Yingzhi Xia, Qifeng Liao
Bayesian inverse problems are often computationally challenging when the forward model is governed by complex partial differential equations (PDEs). This is typically caused by expensive forward model evaluations and high-dimensional parameterization of priors. This paper proposes a domain-decomposed variational auto-encoder Markov chain Monte Carlo (DD-VAE-MCMC) method to tackle these challenges simultaneously. Through partitioning the global physical domain into small subdomains, the proposed method first constructs local deterministic generative models based on local historical data, which provide efficient local prior representations. Gaussian process models with active learning address the domain decomposition interface conditions. Then inversions are conducted on each subdomain independently in parallel and in low-dimensional latent parameter spaces. The local inference solutions are post-processed through the Poisson image blending procedure to result in an efficient global inference result. Numerical examples are provided to demonstrate the performance of the proposed method.
LGJul 11, 2022
Deep neural network based adaptive learning for switched systemsJunjie He, Zhihang Xu, Qifeng Liao
In this paper, we present a deep neural network based adaptive learning (DNN-AL) approach for switched systems. Currently, deep neural network based methods are actively developed for learning governing equations in unknown dynamic systems, but their efficiency can degenerate for switching systems, where structural changes exist at discrete time instants. In this new DNN-AL strategy, observed datasets are adaptively decomposed into subsets, such that no structural changes within each subset. During the adaptive procedures, DNNs are hierarchically constructed, and unknown switching time instants are gradually identified. Especially, network parameters at previous iteration steps are reused to initialize networks for the later iteration steps, which gives efficient training procedures for the DNNs. For the DNNs obtained through our DNN-AL, bounds of the prediction error are established. Numerical studies are conducted to demonstrate the efficiency of DNN-AL.
FLU-DYNJan 30
Parametrization of subgrid scales in long-term simulations of the shallow-water equations using machine learning and convex limitingMd Amran Hossan Mojamder, Zhihang Xu, Min Wang et al.
We present a method for parametrizing sub-grid processes in the Shallow Water equations. We define coarse variables and local spatial averages and use a feed-forward neural network to learn sub-grid fluxes. Our method results in a local parametrization that uses a four-point computational stencil, which has several advantages over globally coupled parametrizations. We demonstrate numerically that our method improves energy balance in long-term turbulent simulations and also accurately reproduces individual solutions. The neural network parametrization can be easily combined with flux limiting to reduce oscillations near shocks. More importantly, our method provides reliable parametrizations, even in dynamical regimes that are not included in the training data.
SDOct 25, 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021Yanqing Liu, Zhihang Xu, Gang Wang et al.
This paper describes the Microsoft end-to-end neural text to speech (TTS) system: DelightfulTTS for Blizzard Challenge 2021. The goal of this challenge is to synthesize natural and high-quality speech from text, and we approach this goal in two perspectives: The first is to directly model and generate waveform in 48 kHz sampling rate, which brings higher perception quality than previous systems with 16 kHz or 24 kHz sampling rate; The second is to model the variation information in speech through a systematic design, which improves the prosody and naturalness. Specifically, for 48 kHz modeling, we predict 16 kHz mel-spectrogram in acoustic model, and propose a vocoder called HiFiNet to directly generate 48 kHz waveform from predicted 16 kHz mel-spectrogram, which can better trade off training efficiency, modelling stability and voice quality. We model variation information systematically from both explicit (speaker ID, language ID, pitch and duration) and implicit (utterance-level and phoneme-level prosody) perspectives: 1) For speaker and language ID, we use lookup embedding in training and inference; 2) For pitch and duration, we extract the values from paired text-speech data in training and use two predictors to predict the values in inference; 3) For utterance-level and phoneme-level prosody, we use two reference encoders to extract the values in training, and use two separate predictors to predict the values in inference. Additionally, we introduce an improved Conformer block to better model the local and global dependency in acoustic model. For task SH1, DelightfulTTS achieves 4.17 mean score in MOS test and 4.35 in SMOS test, which indicates the effectiveness of our proposed system