CLLGMay 3, 2023

Using Language Models on Low-end Hardware

arXiv:2305.02350v2
Originality Synthesis-oriented
AI Analysis

This work addresses resource constraints for users with low-end hardware, but it is incremental as it builds on existing language models and CNN architectures.

The paper tackled the problem of training text classification networks on low-end hardware by evaluating fixed language models combined with a CNN architecture across 8 datasets, finding that not fine-tuning the language model yields competitive effectiveness with faster training and only a quarter of the memory usage compared to fine-tuning.

This paper evaluates the viability of using fixed language models for training text classification networks on low-end hardware. We combine language models with a CNN architecture and put together a comprehensive benchmark with 8 datasets covering single-label and multi-label classification of topic, sentiment, and genre. Our observations are distilled into a list of trade-offs, concluding that there are scenarios, where not fine-tuning a language model yields competitive effectiveness at faster training, requiring only a quarter of the memory compared to fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes