CLJan 30

DecompressionLM: Deterministic, Diagnostic, and Zero-Shot Concept Graph Extraction from Language Models

arXiv:2602.00377v2h-index: 5
AI Analysis

This addresses the limitation of existing knowledge probing methods for language models, offering a novel diagnostic tool for assessing knowledge breadth and factual grounding in compressed models, though it is incremental in improving extraction techniques.

The paper tackled the problem of extracting concept graphs from language models without pre-defined queries, introducing DecompressionLM, a zero-shot framework that uses Van der Corput sequences and arithmetic decoding to achieve deterministic, parallel generation, resulting in findings such as activation-aware quantization expanding concept coverage by 30-170% and uniform quantization causing 71-86% coverage collapse.

Existing knowledge probing methods rely on pre-defined queries, limiting extraction to known concepts. We introduce DecompressionLM, a stateless framework for zero-shot concept graph extraction that discovers what language models encode without pre-specified queries or shared cross-sequence state. Our method targets three limitations of common decoding-based probing approaches: (i) cross-sequence coupling that concentrates probability mass on high-frequency prefixes, (ii) competitive decoding effects that suppress long-tail concepts, and (iii) scalability constraints arising from sequential exploration. Using Van der Corput low-discrepancy sequences with arithmetic decoding, DecompressionLM enables deterministic, embarrassingly parallel generation without shared state across sequences. Across two model families and five quantization variants, we find that activation-aware quantization (AWQ-4bit) expands concept coverage by 30-170%, while uniform quantization (GPTQ-Int4) induces 71-86% coverage collapse - divergent behaviors not reliably reflected by explanation-level perplexity. Corpus-based verification further reveals a 19.6-point hallucination gap between top- and bottom-ranked MMLU-Pro Law models. DecompressionLM establishes concept coverage as a complementary evaluation dimension for assessing knowledge breadth and factual grounding in compressed models intended for deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes