Junjie Ma

IT
h-index2
7papers
8citations
Novelty58%
AI Score53

7 Papers

17.7ITApr 7
Asymptotic Analysis of Nonlinear One-Bit Precoding in Massive MIMO Systems via Approximate Message Passing

Zheyu Wu, Junjie Ma, Ya-Feng Liu et al.

Massive multiple-input multiple-output (MIMO) systems employing one-bit digital-to-analog converters offer a hardware-efficient solution for wireless communications. However, the one-bit constraint poses significant challenges for precoding design, as it transforms the problem into a discrete and nonconvex optimization task. In this paper, we investigate a widely adopted ``convex-relaxation-then-quantization" approach for nonlinear symbol-level one-bit precoding. Specifically, we first solve a convex relaxation of the discrete minimum mean square error precoding problem, and then quantize the solution to satisfy the one-bit constraint. Focusing on a real-valued system with an independently and identically distributed (i.i.d.) Gaussian channel, we develop a novel analytical framework based on approximate message passing (AMP) to characterize the high-dimensional asymptotic performance of the considered scheme. The key technical ingredient is an auxiliary AMP iteration that dedicatedly incorporates the nonlinear quantization function into the state evolution analysis. With the proposed framework, we derive a closed-form expression for the symbol error probability (SEP) at the receiver side in the large-system limit, which provides a quantitative characterization of how model and system parameters affect the SEP performance. Our empirical results suggest that the $\ell_\infty^2$ regularizer, when paired with an optimally chosen regularization parameter, achieves optimal SEP performance within a broad class of convex regularization functions. As a first step towards a theoretical justification, we prove the optimality of the $\ell_\infty^2$ regularizer within the mixed $\ell_\infty^2$-$\ell_2^2$ regularization functions.

CLMar 4
GeoBlock: Inferring Block Granularity from Dependency Geometry in Diffusion Language Models

Lipeng Wan, Junjie Ma, Jianhui Gu et al.

Block diffusion enables efficient parallel refinement in diffusion language models, but its decoding behavior depends critically on block size. Existing block-sizing strategies rely on fixed rules or heuristic signals and do not account for the dependency geometry that determines which tokens can be safely refined together. This motivates a geometry view of diffusion decoding: \emph{regions with strong causal ordering require sequential updates, whereas semantically cohesive regions admit parallel refinement.} We introduce GeoBlock, a geometry-aware block inference framework that determines block granularity directly from attention-derived dependency geometry. Instead of relying on predefined schedules or local confidence heuristics, GeoBlock analyzes cross-token dependency patterns to identify geometrically stable refinement regions and dynamically determines appropriate block boundaries during decoding. By adapting block granularity to the dependency geometry, GeoBlock preserves the parallel efficiency of block diffusion while enforcing dependency-consistent refinement that exhibits autoregressive reliability. GeoBlock requires no additional training and integrates seamlessly into existing block diffusion architectures. Extensive experiments across multiple benchmarks show that GeoBlock reliably identifies geometry-consistent block boundaries and improves the accuracy of block diffusion with only a small additional computational budget.

AIDec 16, 2025Code
RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees

Junjie Ma, Jinlong Li

Inference with modern Large Language Models (LLMs) is expensive and slow, and speculative sampling has emerged as an effective solution to this problem, however, the number of the calls to the draft model for generating candidate tokens in speculative sampling is a preset hyperparameter, lacking flexibility. To generate and utilize the candidate tokens more effectively, we propose RADAR, a novel speculative sampling method with RL-based dynamic draft trees. RADAR formulates the draft tree generation process as a Markov Decision Process (MDP) and employs offline reinforcement learning to train a prediction model, which enables real-time decision on the calls to the draft model, reducing redundant computations and further accelerating inference. Evaluations across three LLMs and four tasks show that RADAR achieves a speedup of 3.17x-4.82x over the auto-regressive decoding baseline. The code is available at https://github.com/minaduki-sora/RADAR.

ITDec 22, 2025
Orthogonal Approximate Message Passing with Optimal Spectral Initializations for Rectangular Spiked Matrix Models

Haohua Chen, Songbin Liu, Junjie Ma

We propose an orthogonal approximate message passing (OAMP) algorithm for signal estimation in the rectangular spiked matrix model with general rotationally invariant (RI) noise. We establish a rigorous state evolution that precisely characterizes the algorithm's high-dimensional dynamics and enables the construction of iteration-wise optimal denoisers. Within this framework, we accommodate spectral initializations under minimal assumptions on the empirical noise spectrum. In the rectangular setting, where a single rank-one component typically generates multiple informative outliers, we further propose a procedure for combining these outliers under mild non-Gaussian signal assumptions. For general RI noise models, the predicted performance of the proposed optimal OAMP algorithm agrees with replica-symmetric predictions for the associated Bayes-optimal estimator, and we conjecture that it is statistically optimal within a broad class of iterative estimation methods.

STDec 2, 2024
Unifying AMP Algorithms for Rotationally-Invariant Models

Songbin Liu, Junjie Ma

This paper presents a unified framework for constructing Approximate Message Passing (AMP) algorithms for rotationally-invariant models. By employing a general iterative algorithm template and reducing it to long-memory Orthogonal AMP (OAMP), we systematically derive the correct Onsager terms of AMP algorithms. This approach allows us to rederive an AMP algorithm introduced by Fan and Opper et al., while shedding new light on the role of free cumulants of the spectral law. The free cumulants arise naturally from a recursive centering operation, potentially of independent interest beyond the scope of AMP. To illustrate the flexibility of our framework, we introduce two novel AMP variants and apply them to estimation in spiked models.

AIMar 4
Progressive Refinement Regulation for Accelerating Diffusion Language Model Decoding

Lipeng Wan, Jianhui Gu, Junjie Ma et al.

Diffusion language models generate text through iterative denoising under a uniform refinement rule applied to all tokens. However, tokens stabilize at different rates in practice, leading to substantial redundant refinement and motivating refinement control over the denoising process. Existing approaches typically assess refinement necessity from instantaneous, step-level signals under a fixed decoding process. In contrast, whether a token has converged is defined by how its prediction changes along its future refinement trajectory. Moreover, changing the refinement rule reshapes future refinement trajectories, which in turn determine how refinement rules should be formulated, making refinement control inherently dynamic. We propose \emph{Progressive Refinement Regulation} (PRR), a progressive, trajectory-grounded refinement control framework that derives a token-level notion of empirical convergence progress from full decoding rollouts. Based on this signal, PRR learns a lightweight token-wise controller to regulate refinement via temperature-based distribution shaping under a progressive self-evolving training scheme. Experiments show that PRR substantially accelerates diffusion language model decoding while preserving generation quality.

ITNov 5, 2021
Towards Designing Optimal Sensing Matrices for Generalized Linear Inverse Problems

Junjie Ma, Ji Xu, Arian Maleki

We consider an inverse problem $\mathbf{y}= f(\mathbf{Ax})$, where $\mathbf{x}\in\mathbb{R}^n$ is the signal of interest, $\mathbf{A}$ is the sensing matrix, $f$ is a nonlinear function and $\mathbf{y} \in \mathbb{R}^m$ is the measurement vector. In many applications, we have some level of freedom to design the sensing matrix $\mathbf{A}$, and in such circumstances we could optimize $\mathbf{A}$ to achieve better reconstruction performance. As a first step towards optimal design, it is important to understand the impact of the sensing matrix on the difficulty of recovering $\mathbf{x}$ from $\mathbf{y}$. In this paper, we study the performance of one of the most successful recovery methods, i.e., the expectation propagation (EP) algorithm. We define a notion of spikiness for the spectrum of $\bmmathbfA}$ and show the importance of this measure for the performance of EP. We show that whether a spikier spectrum can hurt or help the recovery performance depends on $f$. Based on our framework, we are able to show that, in phase-retrieval problems, matrices with spikier spectrums are better for EP, while in 1-bit compressed sensing problems, less spiky spectrums lead to better performance. Our results unify and substantially generalize existing results that compare Gaussian and orthogonal matrices, and provide a platform towards designing optimal sensing systems.