CLSep 11, 2021

HYDRA -- Hyper Dependency Representation Attentions

Ha-Thanh Nguyen, Vu Tran, Tran-Binh Dang, Minh-Quan Bui, Minh-Phuong Nguyen, Le-Minh Nguyen

arXiv:2109.05349v10.2

Originality Incremental advance

AI Analysis

This addresses the problem of data efficiency and knowledge integration for researchers and practitioners using transformer models, though it is incremental as it builds on existing linguistic and transformer paradigms.

The paper tackles the challenge of determining sufficient data for large transformer models by proposing HYDRA heads, lightweight pretrained linguistic self-attention heads that inject knowledge without full retraining, resulting in boosted performance and architecture-friendly integration.

Attention is all we need as long as we have enough data. Even so, it is sometimes not easy to determine how much data is enough while the models are becoming larger and larger. In this paper, we propose HYDRA heads, lightweight pretrained linguistic self-attention heads to inject knowledge into transformer models without pretraining them again. Our approach is a balanced paradigm between leaving the models to learn unsupervised and forcing them to conform to linguistic knowledge rigidly as suggested in previous studies. Our experiment proves that the approach is not only the boost performance of the model but also lightweight and architecture friendly. We empirically verify our framework on benchmark datasets to show the contribution of linguistic knowledge to a transformer model. This is a promising result for a new approach to transferring knowledge from linguistic resources into transformer-based models.

View on arXiv PDF

Similar