Using stochastic computation graphs formalism for optimization of sequence-to-sequence model
This work offers a theoretical framework for researchers to develop new network architectures with stochastic nodes, but it appears incremental as it builds on existing optimization approaches.
The authors tackled the optimization of sequence-to-sequence models with attention by reformulating it using stochastic computation graphs, providing a unified view and examples in machine translation.
Variety of machine learning problems can be formulated as an optimization task for some (surrogate) loss function. Calculation of loss function can be viewed in terms of stochastic computation graphs (SCG). We use this formalism to analyze a problem of optimization of famous sequence-to-sequence model with attention and propose reformulation of the task. Examples are given for machine translation (MT). Our work provides a unified view on different optimization approaches for sequence-to-sequence models and could help researchers in developing new network architectures with embedded stochastic nodes.