AlphaZip: Neural Network-Enhanced Lossless Text Compression
This work addresses text compression for data storage and transmission, but it is incremental as it builds on existing neural and standard compression methods.
The paper tackles lossless text compression by using a Large Language Model for prediction combined with standard compression algorithms, achieving improved performance compared to conventional baselines.
Data compression continues to evolve, with traditional information theory methods being widely used for compressing text, images, and videos. Recently, there has been growing interest in leveraging Generative AI for predictive compression techniques. This paper introduces a lossless text compression approach using a Large Language Model (LLM). The method involves two key steps: first, prediction using a dense neural network architecture, such as a transformer block; second, compressing the predicted ranks with standard compression algorithms like Adaptive Huffman, LZ77, or Gzip. Extensive analysis and benchmarking against conventional information-theoretic baselines demonstrate that neural compression offers improved performance.