CLOct 12, 2020

Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations

arXiv:2010.05763v1996 citations
Originality Incremental advance
AI Analysis

This work addresses inefficiencies in BERT for hierarchical multilabel text classification, offering an incremental improvement to enhance model performance and resource usage.

The paper tackles BERT's over-parameterization and under-utilization issues by proposing a structured fine-tuning approach that guides specific layers to predict labels from specific hierarchy levels in large-scale multilabel text classification, resulting in better classification results and parameter utilization.

Although BERT is widely used by the NLP community, little is known about its inner workings. Several attempts have been made to shed light on certain aspects of BERT, often with contradicting conclusions. A much raised concern focuses on BERT's over-parameterization and under-utilization issues. To this end, we propose o novel approach to fine-tune BERT in a structured manner. Specifically, we focus on Large Scale Multilabel Text Classification (LMTC) where documents are assigned with one or more labels from a large predefined set of hierarchically organized labels. Our approach guides specific BERT layers to predict labels from specific hierarchy levels. Experimenting with two LMTC datasets we show that this structured fine-tuning approach not only yields better classification results but also leads to better parameter utilization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes