Yaobo Zhang

LG
5papers
84citations
Novelty45%
AI Score43

5 Papers

LGJun 3
PJ-RoPE: A Fourier-Jet-Affine Position Space for Relative Attention

Yaobo Zhang

We unify RoPE's Fourier phase, Jordan-RoPE's finite jets, and ALiBi's affine recency into a single learnable relative-position space, and study which regions of this space are selected by different tasks. PJ-RoPE is a Fourier-Jet-Affine formulation for relative attention, with an optional Poincare-type reading as the affine completion of a homogeneous Fourier-jet positional representation. Algebraically, the same primitives form a finite constant-coefficient difference module: simple roots of the lag-shift operator give Fourier/RoPE characters, repeated nonzero roots give Jordan/Fourier jets, and the repeated unit root gives ALiBi-like affine recency. The framework separates scalar PJ-bias kernels from exact PJ-rotary feature transforms, introduces adaptive sector diagnostics, and uses LC/rapidity coordinates to stabilize high-order jets. Controlled probes verify sector containment and selection; small language runs expose an affine/recency boundary; music-token streams provide the clearest case where LC/affine variants remain strong while carrying measurable high-order corrections; and LC diagnostics show a scale-stability gain coupled to phase-resolution loss.

LGMay 5
Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks

Yaobo Zhang

Relative positional encodings determine which functions of query-key lag can enter the primitive attention logit. RoPE supplies a rotary phase, while ALiBi supplies an additive distance bias. Motivated by group-theoretic views of linear translation-invariant positional encodings, we study a non-semisimple case in which a complex rotary eigenvalue and a nilpotent response live in the same defective Jordan block. The resulting relative operator generates oscillatory-polynomial features such as $e^{-γd}\cos(ωd)$, $e^{-γd}\sin(ωd)$, $d e^{-γd}\cos(ωd)$, and $d e^{-γd}\sin(ωd)$, for causal lag $d=i-j\geq 0$. Thus the construction realizes a distance-modulated phase basis $d e^{iωd}$, rather than merely adding a separate distance channel to RoPE. We formulate Exact Jordan-RoPE as a non-semisimple one-parameter representation, give its real block form, and specify the contragredient query action required by non-orthogonal positional maps. We also distinguish this exact representation from stabilized variants whose bounded shear improves numerical behavior but breaks the exact group law. Kernel-level diagnostics and a Jordan-friendly synthetic language-model task show that the coupled Jordan basis is useful when the target contains distance-modulated phase interactions. On a small WikiText-103 byte language model, a scaled-exact variant improves over RoPE and direct-sum baselines within the Jordan family, while RoPE+ALiBi remains strongest overall. The evidence is structural rather than a broad performance claim.

QUANT-PHFeb 5, 2021
Effects of quantum resources on the statistical complexity of quantum circuits

Kaifeng Bu, Dax Enshan Koh, Lu Li et al.

We investigate how the addition of quantum resources changes the statistical complexity of quantum circuits by utilizing the framework of quantum resource theories. Measures of statistical complexity that we consider include the Rademacher complexity and the Gaussian complexity, which are well-known measures in computational learning theory that quantify the richness of classes of real-valued functions. We derive bounds for the statistical complexities of quantum circuits that have limited access to certain resources and apply our results to two special cases: (1) stabilizer circuits that are supplemented with a limited number of T gates and (2) instantaneous quantum polynomial-time Clifford circuits that are supplemented with a limited number of CCZ gates. We show that the increase in the statistical complexity of a quantum circuit when an additional quantum channel is added to it is upper bounded by the free robustness of the added channel. Finally, we derive bounds for the generalization error associated with learning from training data arising from quantum circuits.

QUANT-PHJan 15, 2021
On the statistical complexity of quantum circuits

Kaifeng Bu, Dax Enshan Koh, Lu Li et al.

In theoretical machine learning, the statistical complexity is a notion that measures the richness of a hypothesis space. In this work, we apply a particular measure of statistical complexity, namely the Rademacher complexity, to the quantum circuit model in quantum computation and study how the statistical complexity depends on various quantum circuit parameters. In particular, we investigate the dependence of the statistical complexity on the resources, depth, width, and the number of input and output registers of a quantum circuit. To study how the statistical complexity scales with resources in the circuit, we introduce a resource measure of magic based on the $(p,q)$ group norm, which quantifies the amount of magic in the quantum channels associated with the circuit. These dependencies are investigated in the following two settings: (i) where the entire quantum circuit is treated as a single quantum channel, and (ii) where each layer of the quantum circuit is treated as a separate quantum channel. The bounds we obtain can be used to constrain the capacity of quantum neural networks in terms of their depths and widths as well as the resources in the network.

LGOct 15, 2020
Depth-Width Trade-offs for Neural Networks via Topological Entropy

Kaifeng Bu, Yaobo Zhang, Qingxian Luo

One of the central problems in the study of deep learning theory is to understand how the structure properties, such as depth, width and the number of nodes, affect the expressivity of deep neural networks. In this work, we show a new connection between the expressivity of deep neural networks and topological entropy from dynamical system, which can be used to characterize depth-width trade-offs of neural networks. We provide an upper bound on the topological entropy of neural networks with continuous semi-algebraic units by the structure parameters. Specifically, the topological entropy of ReLU network with $l$ layers and $m$ nodes per layer is upper bounded by $O(l\log m)$. Besides, if the neural network is a good approximation of some function $f$, then the size of the neural network has an exponential lower bound with respect to the topological entropy of $f$. Moreover, we discuss the relationship between topological entropy, the number of oscillations, periods and Lipschitz constant.