LG PFJul 7, 2025

Accuracy and Consumption analysis from a compressed model by CompactifAI from Multiverse Computing

Damien Fovet, Shashank Chamoli, Sarah Oury, Srishti Singhal

arXiv:2507.08836v11 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses efficiency and cost issues for users deploying large language models, though it appears incremental as it applies an existing compression method to a new model.

The study evaluated CompactifAI's compression method on Llama 3.1 8B, finding it significantly reduced computational resources while maintaining model accuracy, making the model more efficient and cost-effective.

This study evaluates the performance of a compression method, called CompactifAI, developed by Multiverse Computing, applied to the large language model Llama 3.1 8B\cite{llama}. The evaluation focused on model efficiency (in terms of energy consumption) and accuracy using respectively the frameworks Codecarbon\cite{codecarbon} and Ragas\cite{ragas}. A comparison was performed between the model compressed with CompactifAI\cite{compactifai}\cite{compactifai2} and its full-size version. Our findings reveal that the compressed model using CompactifAI not only significantly reduced the computational resources but also maintained the model accuracy, making the model more efficient, scalable and cost-effective.

View on arXiv PDF

Similar