Is BitNet superseded?

BitNet (LLM quantization): superseded — cited as a baseline and beaten by newer methods. 5 paper(s) critique it, 1 beat it on benchmarks — #15 of 80 most-superseded. Sub-problem: cluster led by RTN. Newer alternatives in the same sub-problem include STaR-Quant, Timestep-Aware SVDQuant-GPTQ, BWLA, Bit-by-Bit, Benford-Quant.

Method Drift›LLM quantization

Superseded baseline#15 of 80 most-superseded

BitNet

BitNet: Scaling 1-bit Transformers for Large Language Models

LLM quantization · first seen Oct 17, 2023

superseded — cited as a baseline and beaten by newer methods

5 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites BitNet as a baseline.

“Demonstrates feasibility of extreme quantization but targets different domain (LLMs vs CNNs) and hardware (data center GPUs vs commodity CPUs).”
— True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity
“BitNet has demonstrated the potential of ternary weight representations, yet requires as many as 2T tokens to establish a stable low-bit model.”
— Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
“However, the prolonged training duration and inherently limited scalability significantly constrain their practical deployment.”
— CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs
“this framework typically demands pre-training from scratch to ensure convergence, incurring prohibitive computational costs that hinder widespread adoption”
— HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs
“BitNet a4.8 addresses this issue by using resource-intensive quantization-aware training (QAT) to achieve 1-bit weights with 4-bit activations.”
— BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs

Beaten on benchmarks

Head-to-head results where a newer method reports beating BitNet. Values are copied from the source paper's tables — verify against the cited paper.

RobuQ (w/o AMP) beats BitNet · FID [ImageNet steps=50 cfg=1.5]
17.97 vs 41.59
RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization
RobuQ (w/o AMP) beats BitNet · IS [ImageNet steps=50 cfg=1.5]
103.24 vs 44.32
RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization
RobuQ beats BitNet · FID [ImageNet steps=50 cfg=1.5 W1.58A2]
30.30 vs 41.59
RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization
RobuQ (w/o AMP) beats BitNet · FID [FFHQ steps=50 Uncondition W1.58A4]
25.62 vs 66.55
RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.