CLAIJul 17, 2025

Making Language Model a Hierarchical Classifier

arXiv:2507.12930v2h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the limitation of single-layer decoding in language models for researchers and practitioners, offering an incremental improvement through architectural adaptation.

The paper tackles the problem of decoder-only language models decoding only from the last layer by proposing a hierarchical decoder architecture where intermediate layers also decode text, adapting a pretrained model with copied and fine-tuned language heads. The result shows that this approach achieves state-of-the-art performance on tasks like hierarchical text classification and generation, outperforming baselines on datasets such as WoS, DBpedia, and EmpatheticDialogues.

Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESconv, EmpatheticDialogues, and several cognitive tests. We also provide thorough theoretical analysis to validate the convergence and computational savings of our methodology. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes