Rethinking the adaptive relationship between Encoder Layers and Decoder Layers
This work addresses an incremental improvement in neural machine translation efficiency for researchers and practitioners, focusing on model architecture adjustments.
The paper tackled the problem of optimizing the adaptive relationship between encoder and decoder layers in a German-to-English translation model by introducing a bias-free fully connected layer with different weight initializations, finding that fine-tuning with structural modifications yields suboptimal performance, but retraining shows significant potential.
This article explores the adaptive relationship between Encoder Layers and Decoder Layers using the SOTA model Helsinki-NLP/opus-mt-de-en, which translates German to English. The specific method involves introducing a bias-free fully connected layer between the Encoder and Decoder, with different initializations of the layer's weights, and observing the outcomes of fine-tuning versus retraining. Four experiments were conducted in total. The results suggest that directly modifying the pre-trained model structure for fine-tuning yields suboptimal performance. However, upon observing the outcomes of the experiments with retraining, this structural adjustment shows significant potential.