AQ-SGD
LLM quantization
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites AQ-SGD as a baseline.
“AQ-SGD requires storing previous activations for the whole dataset to compute these changes, resulting in substantial memory overhead. Such an approach poses practical limitations, especially in resource-constrained environments for the large-volumes of training data where storage capacity and system complexity are critical considerations.”
— TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network“Existing strategies to reduce communication overhead, such as activation quantization wang2023finetuninglanguagemodelsslow, lin2022lglsqlearnedgradientlinear, wu2023estimatormeetsequilibriumperspective, chen2024channel, yang2024gwq, typically use static precision and thus fail to adapt to flexible bit-width throughout training.”
— AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
Beaten on benchmarks
Head-to-head results where a newer method reports beating AQ-SGD. Values are copied from the source paper's tables — verify against the cited paper.
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [LLaMA3-8B, GSM8k]
1.614 vs 1.636
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [LLaMA3-8B, MATH]
1.905 vs 1.924
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [LLaMA3-8B, Code-Alpaca]
1.750 vs 1.805
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [Qwen2.5-14B, GSM8k]
1.462 vs 1.477
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [Qwen2.5-14B, MATH]
1.667 vs 1.694
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [Qwen2.5-14B, Code-Alpaca]
1.609 vs 1.919
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [Phi-3-Medium, GSM8k]
1.407 vs 1.426
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [Phi-3-Medium, MATH]
1.634 vs 1.648
- AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
AMAQ beats AQ-SGD · PPL [Phi-3-Medium, Code-Alpaca]
1.504 vs 1.736
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.