LGCLDec 16, 2024

Krony-PT: GPT2 compressed with Kronecker Products

arXiv:2412.12351v22 citationsh-index: 122025 3rd International Conference on Foundation and Large Language Models (FLLM)
Originality Incremental advance
AI Analysis

This addresses model compression for large language models like GPT-2, offering a more efficient alternative, though it appears incremental as it builds on existing Kronecker-based methods.

The paper tackles compressing GPT-2 using Kronecker products, specifically targeting feed-forward weights, and results in models from 80M to 96M parameters, with an 81M variant outperforming DistilGPT2 on next-token prediction across standard datasets.

We introduce Krony-PT, a compression technique for GPT-2 based on Kronecker products. We specifically target the feed-forward weights of each transformer block, and systematically compress the feed-forward layer matrices to various degrees. We introduce a modified Van Loan decomposition to initialize new Kronecker factors, and also propose a new pruning-based initialization technique. Our method compresses the original 124M-parameter GPT-2 to various smaller models, ranging from 80M to 96M. Our 81M model variant outperforms DistilGPT2 on next-token prediction across all standard language modeling datasets, and shows competitive or comparable performance with significantly larger Kronecker-based compressions of GPT-2.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes