Linear pretraining in recurrent mixture density networks
This is an incremental improvement for researchers and practitioners using RMDNs to avoid training instability in time-series modeling.
The paper tackles the problem of bad local minima and NaN issues during training in recurrent mixture density networks (RMDNs) by proposing a linear pretraining method, which improves performance and ensures the RMDN surpasses its linear GARCH counterpart.
We present a method for pretraining a recurrent mixture density network (RMDN). We also propose a slight modification to the architecture of the RMDN-GARCH proposed by Nikolaev et al. [2012]. The pretraining method helps the RMDN avoid bad local minima during training and improves its robustness to the persistent NaN problem, as defined by Guillaumes [2017], which is often encountered with mixture density networks. Such problem consists in frequently obtaining "Not a number" (NaN) values during training. The pretraining method proposed resolves these issues by training the linear nodes in the hidden layer of the RMDN before starting including non-linear node updates. Such an approach improves the performance of the RMDN and ensures it surpasses that of the GARCH model, which is the RMDN's linear counterpart.