ITJun 23, 2022
Sufficient Statistic Memory Approximate Message PassingLei Liu, Shunqi Huang, Brian M. Kurkoski
Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. However, state evolution does not necessarily guarantee the convergence of iterative algorithms. To solve the convergence problem of AMP-type algorithms in principle, this paper proposes a memory AMP (MAMP) under a sufficient statistic condition, named sufficient statistic MAMP (SS-MAMP). We show that the covariance matrices of SS-MAMP are L-banded and convergent. Given an arbitrary MAMP, we can construct the SS-MAMP by damping, which not only ensures the convergence, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution.
LGSep 25, 2025
Binary Autoencoder for Mechanistic Interpretability of Large Language ModelsHakaze Cho, Haolin Yang, Brian M. Kurkoski et al.
Existing works are dedicated to untangling atomized numerical components (features) from the hidden states of Large Language Models (LLMs) for interpreting their mechanism. However, they typically rely on autoencoders constrained by some implicit training-time regularization on single training instances (i.e., $L_1$ normalization, top-k function, etc.), without an explicit guarantee of global sparsity among instances, causing a large amount of dense (simultaneously inactive) features, harming the feature sparsity and atomization. In this paper, we propose a novel autoencoder variant that enforces minimal entropy on minibatches of hidden activations, thereby promoting feature independence and sparsity across instances. For efficient entropy calculation, we discretize the hidden activations to 1-bit via a step function and apply gradient estimation to enable backpropagation, so that we term it as Binary Autoencoder (BAE) and empirically demonstrate two major applications: (1) Feature set entropy calculation. Entropy can be reliably estimated on binary hidden activations, which we empirically evaluate and leverage to characterize the inference dynamics of LLMs and In-context Learning. (2) Feature untangling. Similar to typical methods, BAE can extract atomized features from LLM's hidden states. To robustly evaluate such feature extraction capability, we refine traditional feature-interpretation methods to avoid unreliable handling of numerical tokens, and show that BAE avoids dense features while producing the largest number of interpretable ones among baselines, which confirms the effectiveness of BAE serving as a feature extractor.
ITDec 31, 2021
Sufficient-Statistic Memory AMPLei Liu, Shunqi Huang, YuZhi Yang et al.
Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. While state evolution is a useful analytic tool, its convergence is not guaranteed. To solve the convergence problem of the state evolution of AMP-type algorithms in principle, this paper proposes a sufficient-statistic memory AMP (SS-MAMP) algorithm framework under the conditions of right-unitarily invariant sensing matrices, Lipschitz-continuous local processors and the sufficient-statistic constraint (i.e., the current message of each local processor is a sufficient statistic of the signal vector given the current and all preceding messages). We show that the covariance matrices of SS-MAMP are L-banded and convergent, which is an optimal framework (from the local MMSE/LMMSE perspective) for AMP-type algorithms given the Lipschitz-continuous local processors. Given an arbitrary MAMP, we can construct an SS-MAMP by damping, which not only ensures the convergence of the state evolution, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution. As a byproduct, we prove that the Bayes-optimal orthogonal/vector AMP (BO-OAMP/VAMP) is an SS-MAMP. As an example, we construct a sufficient-statistic Bayes-optimal MAMP (SS-BO-MAMP) whose state evolution converges to the minimum (i.e., Bayes-optimal) mean square error (MSE) predicted by replica methods when it has a unique fixed point. In addition, the MSE of SS-BO-MAMP is not worse than the original BO-MAMP. Finally, simulations are provided to support the theoretical results.
ITJun 4, 2021
Memory Approximate Message PassingLei Liu, Shunqi Huang, Brian M. Kurkoski
Approximate message passing (AMP) is a low-cost iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions. However, AMP only applies to independent identically distributed (IID) transform matrices, but may become unreliable for other matrix ensembles, especially for ill-conditioned ones. To handle this difficulty, orthogonal/vector AMP (OAMP/VAMP) was proposed for general right-unitarily-invariant matrices. However, the Bayes-optimal OAMP/VAMP requires high-complexity linear minimum mean square error estimator. To solve the disadvantages of AMP and OAMP/VAMP, this paper proposes a memory AMP (MAMP), in which a long-memory matched filter is proposed for interference suppression. The complexity of MAMP is comparable to AMP. The asymptotic Gaussianity of estimation errors in MAMP is guaranteed by the orthogonality principle. A state evolution is derived to asymptotically characterize the performance of MAMP. Based on the state evolution, the relaxation parameters and damping vector in MAMP are optimized. For all right-unitarily-invariant matrices, the optimized MAMP converges to OAMP/VAMP, and thus is Bayes-optimal if it has a unique fixed point. Finally, simulations are provided to verify the validity and accuracy of the theoretical results.
ITDec 20, 2020
Memory AMPLei Liu, Shunqi Huang, Brian M. Kurkoski
Approximate message passing (AMP) is a low-cost iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions. AMP only applies to independent identically distributed (IID) transform matrices, but may become unreliable (e.g., perform poorly or even diverge) for other matrix ensembles, especially for ill-conditioned ones. To solve this issue, orthogonal/vector AMP (OAMP/VAMP) was proposed for general right-unitarily-invariant matrices. However, the Bayes-optimal OAMP/VAMP (BO-OAMP/VAMP) requires a high-complexity linear minimum mean square error (MMSE) estimator. This prevents OAMP/VAMP from being used in large-scale systems. To address the drawbacks of AMP and BO-OAMP/VAMP, this paper offers a memory AMP (MAMP) framework based on the orthogonality principle, which ensures that estimation errors in MAMP are asymptotically IID Gaussian. To realize the required orthogonality for MAMP, we provide an orthogonalization procedure for the local memory estimators. In addition, we propose a Bayes-optimal MAMP (BO-MAMP), in which a long-memory matched filter is used for interference suppression. The complexity of BO-MAMP is comparable to AMP. To asymptotically characterize the performance of BO-MAMP, a state evolution is derived. The relaxation parameters and damping vector in BO-MAMP are optimized based on state evolution. Most crucially, the state evolution of the optimized BO-MAMP converges to the same fixed point as that of the high-complexity BO-OAMP/VAMP for all right-unitarily-invariant matrices, and achieves the Bayes optimal MSE predicted by the replica method if its state evolution has a unique fixed point. Finally, simulations are provided to verify the theoretical results' validity and accuracy.