78.5SYMay 27
Digital-Based Potentiostat and Mesoporous Microelectrode Co-Design for Non-Enzymatic Glucose Detection at 0.3V-VDD and 1.65nW-PowerAndrea De Gregorio, Mara Serrapede, Danilo Kaddouri et al.
This paper presents a proof-of-concept ultra-low voltage and ultra-low power chronoamperometric electrochemical sensor for non-enzymatic glucose readout integrated circuit (IC) in 130nm CMOS detection featuring a reconfigurable Digital-Based (DB) Potentiostat. The signal transfer and noise characteristics of the new digital-based architecture are analytically described in the frequency domain for the first time by an equivalent linearized model that is validated by simulations and experiments. Based on experiments, the proposed DB potentiostat enables the detection of a wide electrochemical current range, spanning from 600pA to 650nA, with R2=0.991 linearity and consumes only 1.65nW (53.5nW) at V dd = 300mV (V dd = 500mV). The proposed DB readout is tested in a proof of-concept platform for non-enzymatic glucose detection with nanostructured microelectrodes, demonstrating successful non enzymatic glucose detection at physiological levels at the lowest reported voltage and power, even in the presence of an interferent (ascorbic acid) and under aerobic conditions, thus revealing a strong potential for emerging Point of Care (PoC) diagnostics applications.
74.1LGMay 8
Scaling Limits of Long-Context TransformersGiuseppe Bruno, Shi Chen, Zhengjiang Lin et al.
We study the long-context limit of softmax self-attention with a fixed query and a random context of $n$ i.i.d. keys on the sphere, viewing the inverse temperature $β_n$ as the scaling parameter that decides whether attention degenerates into uniform averaging or collapses onto the single closest key. We show that the critical scale at which selectivity emerges is determined by the local exponent of the distance-to-query distribution near zero rather than by global features of the context, and scales like $β_n^\ast \asymp n^{2/(d-1)}$ for uniform keys on $\mathbb{S}^{d-1}$. Furthermore, we characterize the limiting laws of the ordered attention weights and of the attention output across all regimes of $β_n$: a subcritical regime in which the output reduces to a local average around $q$ with explicit deterministic bias and Gaussian fluctuations; a critical regime in which a finite collection of nearest keys retains macroscopic mass without single-key collapse; and a supercritical regime in which all mass concentrates on the closest key. Of notable interest is the subcritical case with identity value matrix where the attention map approximately implements a backward heat equation.
73.5PRApr 29
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer ModelsAndrea Agazzi, Giuseppe Bruno, Eloy Mosig García et al.
We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction energy on average, provided that the common noise is sufficiently coercive relative to the deterministic self-attention drift. We finally characterize the activation functions satisfying the former condition.
LGFeb 6, 2024
The Challenges of the Nonlinear Regime for Physics-Informed Neural NetworksAndrea Bonfanti, Giuseppe Bruno, Cristina Cipriani
The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.
LGOct 30, 2024
Emergence of meta-stable clustering in mean-field transformer modelsGiuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
We model the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere, governed by a mean-field interacting particle system, building on the framework introduced in (Geshkovski et al., 2023). Studying the corresponding mean-field Partial Differential Equation (PDE), which can be interpreted as a Wasserstein gradient flow, in this paper we provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of meta-stable phases and clustering phenomena, key elements in applications like next-token prediction. More specifically, we perform a perturbative analysis of the mean-field PDE around the iid uniform initialization and prove that, in the limit of large number of tokens, the model remains close to a meta-stable manifold of solutions with a given structure (e.g., periodicity). Further, the structure characterizing the meta-stable manifold is explicitly identified, as a function of the inverse temperature parameter of the model, by the index maximizing a certain rescaling of Gegenbauer polynomials.
LGSep 29, 2025
A multiscale analysis of mean-field transformers in the moderate interaction regimeGiuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
In this paper, we study the evolution of tokens through the depth of encoder-only transformer models at inference time by modeling them as a system of particles interacting in a mean-field way and studying the corresponding dynamics. More specifically, we consider this problem in the moderate interaction regime, where the number $N$ of tokens is large and the inverse temperature parameter $β$ of the model scales together with $N$. In this regime, the dynamics of the system displays a multiscale behavior: a fast phase, where the token empirical measure collapses on a low-dimensional space, an intermediate phase, where the measure further collapses into clusters, and a slow one, where such clusters sequentially merge into a single one. We provide a rigorous characterization of the limiting dynamics in each of these phases and prove convergence in the above mentioned limit, exemplifying our results with some simulations.