83.8CLApr 7
The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language ModelsMichael Rizvi-Martel, Guillaume Rabusseau, Marius Mosbach
Latent reasoning via continuous chain-of-thoughts (Latent CoT) has emerged as a promising alternative to discrete CoT reasoning. Operating in continuous space increases expressivity and has been hypothesized to enable superposition: the ability to maintain multiple candidate solutions simultaneously within a single representation. Despite theoretical arguments, it remains unclear whether language models actually leverage superposition when reasoning using latent CoTs. We investigate this question across three regimes: a training-free regime that constructs latent thoughts as convex combinations of token embeddings, a fine-tuned regime where a base model is adapted to produce latent thoughts, and a from-scratch regime where a model is trained entirely with latent thoughts to solve a given task. Using Logit Lens and entity-level probing to analyze internal representations, we find that only models trained from scratch exhibit signs of using superposition. In the training-free and fine-tuned regimes, we find that the superposition either collapses or is not used at all, with models discovering shortcut solutions instead. We argue that this is due to two complementary phenomena: i) pretraining on natural language data biases models to commit to a token in the last layers ii) capacity has a huge effect on which solutions a model favors. Together, our results offer a unified explanation for when and why superposition arises in continuous chain-of-thought reasoning, and identify the conditions under which it collapses.
QUANT-PHOct 30, 2025
FlowQ-Net: A Generative Framework for Automated Quantum Circuit DesignJun Dai, Michael Rizvi-Martel, Guillaume Rabusseau
Designing efficient quantum circuits is a central bottleneck to exploring the potential of quantum computing, particularly for noisy intermediate-scale quantum (NISQ) devices, where circuit efficiency and resilience to errors are paramount. The search space of gate sequences grows combinatorially, and handcrafted templates often waste scarce qubit and depth budgets. We introduce \textsc{FlowQ-Net} (Flow-based Quantum design Network), a generative framework for automated quantum circuit synthesis based on Generative Flow Networks (GFlowNets). This framework learns a stochastic policy to construct circuits sequentially, sampling them in proportion to a flexible, user-defined reward function that can encode multiple design objectives such as performance, depth, and gate count. This approach uniquely enables the generation of a diverse ensemble of high-quality circuits, moving beyond single-solution optimization. We demonstrate the efficacy of \textsc{FlowQ-Net} through an extensive set of simulations. We apply our method to Variational Quantum Algorithm (VQA) ansatz design for molecular ground state estimation, Max-Cut, and image classification, key challenges in near-term quantum computing. Circuits designed by \textsc{FlowQ-Net} achieve significant improvements, yielding circuits that are 10$\times$-30$\times$ more compact in terms of parameters, gates, and depth compared to commonly used unitary baselines, without compromising accuracy. This trend holds even when subjected to error profiles from real-world quantum devices. Our results underline the potential of generative models as a general-purpose methodology for automated quantum circuit design, offering a promising path towards more efficient quantum algorithms and accelerating scientific discovery in the quantum domain.
NAJul 28, 2025
Numerical PDE solvers outperform neural PDE solversPatrick Chatain, Michael Rizvi-Martel, Guillaume Rabusseau et al.
We present DeepFDM, a differentiable finite-difference framework for learning spatially varying coefficients in time-dependent partial differential equations (PDEs). By embedding a classical forward-Euler discretization into a convolutional architecture, DeepFDM enforces stability and first-order convergence via CFL-compliant coefficient parameterizations. Model weights correspond directly to PDE coefficients, yielding an interpretable inverse-problem formulation. We evaluate DeepFDM on a benchmark suite of scalar PDEs: advection, diffusion, advection-diffusion, reaction-diffusion and inhomogeneous Burgers' equations-in one, two and three spatial dimensions. In both in-distribution and out-of-distribution tests (quantified by the Hellinger distance between coefficient priors), DeepFDM attains normalized mean-squared errors one to two orders of magnitude smaller than Fourier Neural Operators, U-Nets and ResNets; requires 10-20X fewer training epochs; and uses 5-50X fewer parameters. Moreover, recovered coefficient fields accurately match ground-truth parameters. These results establish DeepFDM as a robust, efficient, and transparent baseline for data-driven solution and identification of parametric PDEs.
34.1LGApr 2
On the Role of Depth in the Expressivity of RNNsMaude Lizaire, Michael Rizvi-Martel, Éric Dupuis et al.
The benefits of depth in feedforward neural networks are well known: composing multiple layers of linear transformations with nonlinear activations enables complex computations. While similar effects are expected in recurrent neural networks (RNNs), it remains unclear how depth interacts with recurrence to shape expressive power. Here, we formally show that depth increases RNNs' memory capacity efficiently with respect to the number of parameters, thus enhancing expressivity both by enabling more complex input transformations and improving the retention of past information. We broaden our analysis to 2RNNs, a generalization of RNNs with multiplicative interactions between inputs and hidden states. Unlike RNNs, which remain linear without nonlinear activations, 2RNNs perform polynomial transformations whose maximal degree grows with depth. We further show that multiplicative interactions cannot, in general, be replaced by layerwise nonlinearities. Finally, we validate these insights empirically on synthetic and real-world tasks.
MAOct 14, 2025
Benefits and Limitations of Communication in Multi-Agent ReasoningMichael Rizvi-Martel, Satwik Bhattamishra, Neil Rathi et al.
Chain-of-thought prompting has popularized step-by-step reasoning in large language models, yet model performance still degrades as problem complexity and context length grow. By decomposing difficult tasks with long contexts into shorter, manageable ones, recent multi-agent paradigms offer a promising near-term solution to this problem. However, the fundamental capacities of such systems are poorly understood. In this work, we propose a theoretical framework to analyze the expressivity of multi-agent systems. We apply our framework to three algorithmic families: state tracking, recall, and $k$-hop reasoning. We derive bounds on (i) the number of agents required to solve the task exactly, (ii) the quantity and structure of inter-agent communication, and (iii) the achievable speedups as problem size and context scale. Our results identify regimes where communication is provably beneficial, delineate tradeoffs between agent count and bandwidth, and expose intrinsic limitations when either resource is constrained. We complement our theoretical analysis with a set of experiments on pretrained LLMs using controlled synthetic benchmarks. Empirical outcomes confirm the tradeoffs between key quantities predicted by our theory. Collectively, our analysis offers principled guidance for designing scalable multi-agent reasoning systems.
LGJun 7, 2024
A Tensor Decomposition Perspective on Second-order RNNsMaude Lizaire, Michael Rizvi-Martel, Marawan Gamal Abdel Hameed et al.
Second-order Recurrent Neural Networks (2RNNs) extend RNNs by leveraging second-order interactions for sequence modelling. These models are provably more expressive than their first-order counterparts and have connections to well-studied models from formal language theory. However, their large parameter tensor makes computations intractable. To circumvent this issue, one approach known as MIRNN consists in limiting the type of interactions used by the model. Another is to leverage tensor decomposition to diminish the parameter count. In this work, we study the model resulting from parameterizing 2RNNs using the CP decomposition, which we call CPRNN. Intuitively, the rank of the decomposition should reduce expressivity. We analyze how rank and hidden size affect model capacity and show the relationships between RNNs, 2RNNs, MIRNNs, and CPRNNs based on these parameters. We support these results empirically with experiments on the Penn Treebank dataset which demonstrate that, with a fixed parameter budget, CPRNNs outperforms RNNs, 2RNNs, and MIRNNs with the right choice of rank and hidden size.