Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment
This addresses alignment efficiency and quality for statistical machine translation, particularly for distant language pairs, though it appears incremental as it builds on existing BTG and fast_align methods.
The paper tackles hierarchical sub-sentential alignment for machine translation by proposing a BTG-forest-based method with fast unsupervised initialization, achieving comparable translation performance and run-time to fast_align while producing smaller phrase tables and outperforming in distantly related languages like English-Japanese.
In this paper, we propose a novel BTG-forest-based alignment method. Based on a fast unsupervised initialization of parameters using variational IBM models, we synchronously parse parallel sentences top-down and align hierarchically under the constraint of BTG. Our two-step method can achieve the same run-time and comparable translation performance as fast_align while it yields smaller phrase tables. Final SMT results show that our method even outperforms in the experiment of distantly related languages, e.g., English-Japanese.