John M. Cioffi

LG
h-index19
7papers
34citations
Novelty65%
AI Score54

7 Papers

ITJul 3, 2022
Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach

Yun Liao, Seyyed Ali Hashemi, Hengjie Yang et al.

While constructing polar codes for successive-cancellation decoding can be implemented efficiently by sorting the bit-channels, finding optimal polar codes for cyclic-redundancy-check-aided successive-cancellation list (CA-SCL) decoding in an efficient and scalable manner still awaits investigation. This paper first maps a polar code to a unique heterogeneous graph called the polar-code-construction message-passing (PCCMP) graph. Next, a heterogeneous graph-neural-network-based iterative message-passing (IMP) algorithm is proposed which aims to find a PCCMP graph that corresponds to the polar code with minimum frame error rate under CA-SCL decoding. This new IMP algorithm's major advantage lies in its scalability power. That is, the model complexity is independent of the blocklength and code rate, and a trained IMP model over a short polar code can be readily applied to a long polar code's construction. Numerical experiments show that IMP-based polar-code constructions outperform classical constructions under CA-SCL decoding. In addition, when an IMP model trained on a length-128 polar code directly applies to the construction of polar codes with different code rates and blocklengths, simulations show that these polar code constructions deliver comparable performance to the 5G polar codes.

LGApr 23
Continuous-Utility Direct Preference Optimization

Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal et al.

Large language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality. We introduce Continuous Utility Direct Preference Optimization (CU-DPO), a framework that aligns models to a portfolio of prompt-based cognitive strategies by replacing binary labels with continuous scores that capture fine-grained reasoning quality. We prove that learning with K strategies yields a Theta(K log K) improvement in sample complexity over binary preferences, and that DPO converges to the entropy-regularized utility-maximizing policy. To exploit this signal, we propose a two-stage training pipeline: (i) strategy selection, which optimizes the model to choose the best strategy for a given problem via best-vs-all comparisons, and (ii) execution refinement, which trains the model to correctly execute the selected strategy using margin-stratified pairs. On mathematical reasoning benchmarks, CU-DPO improves strategy selection accuracy from 35-46 percent to 68-78 percent across seven base models, yielding consistent downstream reasoning gains of up to 6.6 points on in-distribution datasets with effective transfer to out-of-distribution tasks.

LGMay 18
General Preference Reinforcement Learning

Muhammad Umer, Muhammad Ahmed Mohsin, Ahsan Bilal et al.

Post-training has split large language model (LLM) alignment into two largely disconnected tracks. Online reinforcement learning (RL) with verifiable rewards drives emergent reasoning on math and code but depends on a programmatic verifier that cannot reach open-ended tasks, while preference optimization handles open-ended generation yet forgoes the continuous exploration that powers online RL. Closing this gap requires a verifier for open-ended quality, but a scalar reward model is the wrong shape for the job. Quality is multi-dimensional, and any scalar score is an incomplete proxy that lets online RL collapse onto whichever axis the score is most sensitive to. We turn instead to the General Preference Model (GPM), which embeds responses into $k$ skew-symmetric subspaces and represents preference as a structured, intransitivity-aware comparison. Building on this, we propose General Preference Reinforcement Learning (GPRL), which carries the $k$-way structure through to the policy update. GPRL computes per-dimension group-relative advantages, normalizes each on its own scale so no axis can dominate, and aggregates them with context-dependent eigenvalues. The same structure powers a closed-loop drift monitor that detects single-axis exploitation and corrects it on the fly by reweighting dimensions and tightening the trust region. Starting from $\texttt{Llama-3-8B-Instruct}$, GPRL reaches a length-controlled win rate of $56.51\%$ on AlpacaEval~2.0 while also outperforming SimPO and SPPO on Arena-Hard, MT-Bench, and WildBench by resisting reward hacking across extended training runs.

LGMay 11
Epistemic Uncertainty for Test-Time Discovery

Kainat Riaz, Muhammad Ahmed Mohsin, Ahsan Bilal et al.

Automated scientific discovery using large language models relies on identifying genuinely novel solutions. Standard reinforcement learning penalizes high-variance mutations, which leads the policy to prioritize familiar patterns. As a result, the maximum reward plateaus even as the average reward increases. Overcoming this limitation requires a signal that distinguishes unexplored regions from intrinsically difficult problems. This necessitates measuring disagreement across independently adapted weight hypotheses rather than relying on a single network's confidence. UG-TTT addresses this challenge by maintaining a small ensemble of low-rank adapters over a frozen base model. The per-token disagreement, quantified as the mutual information between ensemble predictions and weight hypotheses, isolates epistemic uncertainty and identifies positions where insufficient coverage leads to adapter divergence rather than intrinsic problem difficulty. This measure is incorporated as an exploration bonus into the policy gradient, directing the policy toward positions where persistent adapter disagreement signals low training coverage, the same frontier where genuine discovery is possible. A nuclear norm regularizer ensures the adapters remain distinct from one another, thereby preserving the exploration signal throughout training. Across four scientific discovery benchmarks, UG-TTT increases the maximum reward on three tasks, maintains substantially higher solution diversity, and an ablation study confirms that the regularizer is essential for sustaining this behavior.

CVJan 20, 2025
Successive Interference Cancellation-aided Diffusion Models for Joint Channel Estimation and Data Detection in Low Rank Channel Scenarios

Sagnik Bhattacharya, Muhammad Ahmed Mohsin, Kamyar Rajabalifardi et al.

This paper proposes a novel joint channel-estimation and source-detection algorithm using successive interference cancellation (SIC)-aided generative score-based diffusion models. Prior work in this area focuses on massive MIMO scenarios, which are typically characterized by full-rank channels, and fail in low-rank channel scenarios. The proposed algorithm outperforms existing methods in joint source-channel estimation, especially in low-rank scenarios where the number of users exceeds the number of antennas at the access point (AP). The proposed score-based iterative diffusion process estimates the gradient of the prior distribution on partial channels, and recursively updates the estimated channel parts as well as the source. Extensive simulation results show that the proposed method outperforms the baseline methods in terms of normalized mean squared error (NMSE) and symbol error rate (SER) in both full-rank and low-rank channel scenarios, while having a more dominant effect in the latter, at various signal-to-noise ratios (SNR).

ITJan 20, 2025
Task and Perception-aware Distributed Source Coding for Correlated Speech under Bandwidth-constrained Channels

Sagnik Bhattacharya, Muhammad Ahmed Mohsin, Ahsan Bilal et al.

Emerging wireless AR/VR applications require real-time transmission of correlated high-fidelity speech from multiple resource-constrained devices over unreliable, bandwidth-limited channels. Existing autoencoder-based speech source coding methods fail to address the combination of the following - (1) dynamic bitrate adaptation without retraining the model, (2) leveraging correlations among multiple speech sources, and (3) balancing downstream task loss with realism of reconstructed speech. We propose a neural distributed principal component analysis (NDPCA)-aided distributed source coding algorithm for correlated speech sources transmitting to a central receiver. Our method includes a perception-aware downstream task loss function that balances perceptual realism with task-specific performance. Experiments show significant PSNR improvements under bandwidth constraints over naive autoencoder methods in task-agnostic (19%) and task-aware settings (52%). It also approaches the theoretical upper bound, where all correlated sources are sent to a single encoder, especially in low-bandwidth scenarios. Additionally, we present a rate-distortion-perception trade-off curve, enabling adaptive decisions based on application-specific realism needs.

LGNov 17, 2025
On the Fundamental Limits of LLMs at Scale

Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal et al.

Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning degradation, (4) retrieval fragility, and (5) multimodal misalignment. While existing surveys describe these phenomena empirically, they lack a rigorous theoretical synthesis connecting them to the foundational limits of computation, information, and learning. This work closes that gap by presenting a unified, proof-informed framework that formalizes the innate theoretical ceilings of LLM scaling. First, computability and uncomputability imply an irreducible residue of error: for any computably enumerable model family, diagonalization guarantees inputs on which some model must fail, and undecidable queries (e.g., halting-style tasks) induce infinite failure sets for all computable predictors. Second, information-theoretic and statistical constraints bound attainable accuracy even on decidable tasks, finite description length enforces compression error, and long-tail factual knowledge requires prohibitive sample complexity. Third, geometric and computational effects compress long contexts far below their nominal size due to positional under-training, encoding attenuation, and softmax crowding. We further show how likelihood-based training favors pattern completion over inference, how retrieval under token limits suffers from semantic drift and coupling noise, and how multimodal scaling inherits shallow cross-modal alignment. Across sections, we pair theorems and empirical evidence to outline where scaling helps, where it saturates, and where it cannot progress, providing both theoretical foundations and practical mitigation paths like bounded-oracle retrieval, positional curricula, and sparse or hierarchical attention.