ML LGMay 28, 2025

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

arXiv:2505.22811v23 citationsh-index: 2

Originality Highly original

AI Analysis

This addresses efficiency challenges for deploying LLMs in resource-constrained environments, though it appears incremental within the binarization domain.

The paper tackles the problem of reducing large language model complexity through weight binarization, proposing a multi-kernel Boolean framework that enables direct finetuning without latent weights. Experiments show it outperforms recent ultra low-bit quantization and binarization techniques.

Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need for latent weights. This enhances representational capacity and dramatically reduces complexity during both finetuning and inference. Extensive experiments across diverse LLMs show our method outperforms recent ultra low-bit quantization and binarization techniques.

View on arXiv PDF

Similar