CLSep 11, 2021

HYDRA -- Hyper Dependency Representation Attentions

arXiv:2109.05349v1
Originality Incremental advance
AI Analysis

This addresses the problem of data efficiency and knowledge integration for researchers and practitioners using transformer models, though it is incremental as it builds on existing linguistic and transformer paradigms.

The paper tackles the challenge of determining sufficient data for large transformer models by proposing HYDRA heads, lightweight pretrained linguistic self-attention heads that inject knowledge without full retraining, resulting in boosted performance and architecture-friendly integration.

Attention is all we need as long as we have enough data. Even so, it is sometimes not easy to determine how much data is enough while the models are becoming larger and larger. In this paper, we propose HYDRA heads, lightweight pretrained linguistic self-attention heads to inject knowledge into transformer models without pretraining them again. Our approach is a balanced paradigm between leaving the models to learn unsupervised and forcing them to conform to linguistic knowledge rigidly as suggested in previous studies. Our experiment proves that the approach is not only the boost performance of the model but also lightweight and architecture friendly. We empirically verify our framework on benchmark datasets to show the contribution of linguistic knowledge to a transformer model. This is a promising result for a new approach to transferring knowledge from linguistic resources into transformer-based models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes