Partially Randomizing Transformer Weights for Dialogue Response Diversity
This addresses the issue of repetitive responses in dialogue systems for users, though it is incremental as it builds on existing transformer architectures.
The paper tackles the problem of low response diversity in generative open-domain dialogue by proposing the PaRaFormer, a transformer extension that freezes selected layers after random initialization, achieving performance comparable to prior methods without additional training difficulty or increased model complexity.
Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during training/inference, or a significant increase in model size and complexity. Hence, we propose the \underline{Pa}rtially \underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension of the transformer which involves freezing the weights of selected layers after random initialization. Experimental results reveal that the performance of the PaRaformer is comparable to that of the aforementioned approaches, despite not entailing any additional training difficulty or increase in model complexity.