Khaled Kirah

CR
7papers
14citations
Novelty53%
AI Score52

7 Papers

CRApr 20
Structural Dependency Analysis for Masked NTT Hardware: Scalable Pre-Silicon Verification of Post-Quantum Cryptographic Accelerators

Ray Iskander, Khaled Kirah

Post-quantum cryptographic (PQC) accelerators implementing ML-KEM (FIPS 203) and ML-DSA (FIPS 204) require side-channel resistance evidence for FIPS 140-3 certification. However, exact masking-verification tools scale only to gadgets of a few thousand cells. We present a four-stage verification hierarchy, D0/D1 structural dependency analysis, fresh-mask refinement, Boolean Single-Authentication Distance Checking (SADC), and arithmetic SADC, that extends sound first-order masking verification to production arithmetic modules. Applied to the 1.17-million-cell Adams Bridge ML-DSA/ML-KEM accelerator, structural analysis completes in seconds across all 30 masked submodules. A multi-cycle extension (MC-D1) reclassifies 12 modules from structurally clean to structurally flagged. On the 5,543-cell ML-KEM Barrett reduction module, the pipeline machine-verifies 198 of 363 structurally flagged wires (54.5%) as first-order secure, reports 165 as candidate insecure for designer triage (a sound upper bound), and leaves 0 indeterminate. Every verdict is cross validated by Z3 and CVC5 with 0 disagreements across 363 wires. The result narrows manual review from hundreds of structural flags to 165 actionable candidates with mathematical certificates, enabling pre-silicon side-channel evidence generation on production ML-KEM hardware.

CRApr 20
From Finite Enumeration to Universal Proof: Ring-Theoretic Foundations for PQC Hardware Masking Verification

Ray Iskander, Khaled Kirah

Formal verification of masking in post-quantum cryptographic (PQC) hardware relies on SMT solvers over finite domains. Our prior work established structural dependency analysis at scale [1] and quantified the security margin of partial NTT masking [2]. QANARY, our structural dependency analysis framework, verified 1.17 million cells across 30 modules of the Adams Bridge ML-DSA/ML-KEM accelerator [3, 4], but its core soundness result (Theorem 3.9.1) was machine-checked only at $q = 5$ via $2^{25}$ Boolean wire functions. This left portability to ML-KEM ($q = 3{,}329$, FIPS 203 [5]) and ML-DSA ($q = 8{,}380{,}417$, FIPS 204 [6]) as an open gap. NIST IR 8547 [7] (March 2025) motivates closing such gaps. We present the first machine-checked universal proof of the $r$-free sub-theorem of Theorem 3.9.1: for every $q > 0$, every wire function, and every pair of secrets, value-independence implies identical marginal distributions. The proof, in Lean 4 [8] with Mathlib [9], requires five lines versus $2^{25}$ finite evaluations. It is sorry-free, reducing the trusted base from \{Z3 [10], CVC5 [11], Python\} to the Lean 4 kernel. We provide nine theorems (T1--T6, T1', T3') covering reparametrization, bijectivity, overflow bounds, RNG bias, and a universal non-tightness counterexample for all $q \geq 2$. The results establish commutative ring axioms of $\mathbb{Z}/q\mathbb{Z}$ as the natural abstraction layer for arithmetic masking verification.

CRApr 4
Partial Number Theoretic Transform Masking in Post Quantum Cryptography Hardware: A Security Margin Analysis

Ray Iskander, Khaled Kirah

Adams Bridge, a hardware accelerator for ML-DSA and ML-KEM designed for the Caliptra root of trust, masks 1 of its Inverse Number Theoretic Transform (INTT) layers and relies on shuffling for the remainder, claiming per-butterfly Correlation Power Analysis (CPA) complexities of 2^46 (ML-DSA) and 2^96 (ML-KEM). We evaluate these claims against published side-channel literature across seven analysis tracks with confidence-rated evidence. Register-Transfer Level (RTL) analysis confirms that the design's Random Start Index (RSI) shuffling provides 6 bits of entropy per layer (64 orderings) rather than the 296 bits of a full random permutation assumed in its scaling argument, with effective margins below the designers' estimates. A soft-analytical attack pipeline demonstrates a 37-bit enumeration reduction, independent of Belief Propagation (BP) gains, quantifying the attack-model gap without achieving key recovery. Full-scale BP on the complete INTT factor graph achieves 100% coefficient recovery over the single-layer baseline, resolving whether BP gains scale to production-size Number Theoretic Transform (NTT) structures. A genie-aided information-theoretic bound shows observations contain sufficient mutual information for full recovery at SNRxN as low as 15. Layer-ablation analysis identifies four necessary conditions governing BP convergence. Observation topology, not count, determines recovery: 4 evenly spread layers achieve 100% while 4 consecutive layers achieve 0%, yielding a practical countermeasure design tool. Strategic masking of 3 consecutive mid-layers (43% overhead vs. full masking) creates an unrecoverable gap that defeats soft-analytical attacks. We contribute a reusable security margin audit methodology combining RTL verification, epistemic confidence tagging, sensitivity-scenario analysis, and experimental validation applicable to any partially masked NTT accelerator.

CRMay 4
The 1-Bit Barrier is Universal: k-Stage Pipeline Composition and Unified Leakage Bounds for Standard Modular Reductions in PQC Hardware

Ray Iskander, Khaled Kirah

This is Paper 7 of a series of formally-verified analyses of masked NTT hardware for post-quantum cryptography; Paper 1 [1] established structural dependency analysis of the QANARY platform, and Paper 2 [2] quantified security margins under partial NTT masking. Arbitrary-depth $k$-stage masked NTT pipelines with fresh inter-stage masking and per-stage PF-PINI($\leq 2$) gadgets satisfy a per-observation cardinality bound of $2 \cdot q^{2k-2}$ on the preimage of any output value, machine-checked in Lean 4 with zero \texttt{sorry}. Under the standard (informal) semantic translation that divides this cardinality by the total mask-tuple space size $q^{2k-1}$, the per-observation conditional probability bound is $2/q$, independent of pipeline depth $k$. The QANARY program has previously established machine-checked cardinality bounds on the per-observation leakage of masked NTT hardware: PF-PINI(2) for Barrett reduction (Paper 5 [3]), 2-stage composition with fresh inter-stage masking (Paper 6 [4]), an underlying universality theorem (Paper 3 [5]), and PF-PINI(1) for butterfly wires (Paper 4 [6]). This paper closes the program with four contributions. First, a $k$-stage composition theorem generalizing Paper 6's two-stage result to arbitrary $k \geq 1$ gives the last-stage-determined bound $G_{k-1}.\texttt{maxMult} \cdot q^{2k-2}$: only the last stage's PF-PINI parameter survives, with intermediate parameters erased by fresh inter-stage masking. Second, Montgomery reduction satisfies PF-PINI(2) with tight max-multiplicity 2. Third, we assemble these into the end-to-end bound $2 \cdot q^{2k-2}$ for any depth-$k$ PF-PINI($\leq 2$) pipeline under fresh inter-stage masking. Fourth, a Lean-verified hypothesis-violation conditional anchors the prior empirical and structural Adams Bridge analyses ([1, 2, 7, 8]).

CRApr 27
Machine-Checked Cardinality Bounds for Masked Barrett Reduction: A 1-Bit Side-Channel Leakage Barrier in Post-Quantum Cryptographic Hardware

Ray Iskander, Khaled Kirah

Barrett reduction is the nonlinear core of every practical NTT-based post-quantum cryptography implementation. Existing composition frameworks (ISW, t-SNI, PINI, DOM) address Boolean masking over GF(2); none provides a machine-checked characterization of Barrett's leakage under first-order arithmetic masking and the first-order probing model over prime fields. Building on our prior series, QANARY [15], partial-NTT-masking margins [14], algebraic foundations [16], and butterfly composition [18], we close this gap. We prove a trichotomy: for any $q > 0$ and shift $s$, the Barrett internal wire map $f_x(m) = ((x + 2^s - m) \bmod 2^s) \bmod q$ has preimage cardinality in $\{0, 1, 2\}$, never more. We call this the 1-Bit Barrier: max-multiplicity 2 implies at most 1 bit of min-entropy loss per internal wire, universal over all moduli. The count-zero cases, unreachable output values, reveal that actual leakage is often strictly less than 1 bit, making the bound conservative. We introduce PF-PINI (Prime-Field PINI): Barrett satisfies PF-PINI(2); the Cooley-Tukey butterfly satisfies PF-PINI(1). We observe (not yet proved) that with fresh inter-stage masking, the composed pipeline has max-multiplicity $\max(k_1, k_2)$, so the 1-Bit Barrier propagates. The trichotomy, the PF-PINI instantiations, and cardinality results are machine-checked in Lean 4 with Mathlib: 12 proved results, zero sorry, universal over all $q > 0$ (the min-entropy bound follows by standard definitions). Adams Bridge lacks fresh inter-stage masking, violating PF-PINI composition and explaining why Papers 1 [15] and 2 [14] found vulnerabilities. NIST IR 8547 recommends formal methods for PQC implementation validation. The 1-Bit Barrier provides the first universal machine-checked cardinality bound for masked Barrett reduction in ML-KEM (FIPS 203) and ML-DSA (FIPS 204), with a corresponding 1-bit leakage interpretation.

CRApr 28
Prime-Field PINI: Machine-Checked Composition Theorems for Post-Quantum NTT Masking

Ray Iskander, Khaled Kirah

This is Paper 6 of a series of formally-verified analyses of masked NTT hardware for post-quantum cryptography; Paper 1 [1] established structural dependency analysis of the QANARY platform, and Paper 2 [2] quantified security margins under partial NTT masking. Boolean masking composition is well-understood through NI, SNI, and PINI. Arithmetic masking over $\mathbb{Z}_q$ for prime $q$, the foundation of NTT-based post-quantum cryptography, has lacked an analogous theory. We prove, to our knowledge, the first machine-checked composition theorems for arithmetic masking over prime fields. Our key insight is the renewal argument: when a fresh random mask is applied between two pipeline stages, the intermediate wire becomes perfectly uniform regardless of Stage 1's security parameter. For two PF-PINI gadgets with parameters $k_1$ and $k_2$, the composed two-stage pipeline with fresh masking satisfies PF-PINI($k_2$), Stage 1's multiplicity is completely erased from the composed output. Without fresh masking, intermediate wires have multiplicity up to $k_1$, creating a necessary condition for differential power analysis. We formalize both theorems in Lean 4 with 18 machine-checked proofs and zero sorry stubs. We formally bridge the algebraic and hardware-faithful arithmetic models of Barrett reduction, and instantiate the theorems to formally diagnose Microsoft's Adams Bridge PQC accelerator: its absence of fresh inter-stage masking leaves Barrett output wires non-uniform under the first-order probing model, the same architectural flaw that two independent empirical analyses [3, 4] and our own prior structural analysis [1] identified. Computational evidence further suggests the 1-Bit Barrier is universal across Barrett and Montgomery reductions.

CRApr 22
Fresh Masking Makes NTT Pipelines Composable: Machine-Checked Proofs for Arithmetic Masking in PQC Hardware

Ray Iskander, Khaled Kirah

Post-quantum cryptographic (PQC) accelerators for ML-KEM (FIPS 203) and ML-DSA (FIPS 204) rely on pipelined Number Theoretic Transform (NTT) stages over $\mathbb{Z}_q$. Our prior work established structural dependency analysis at scale [1] and quantified the security margin of partial NTT masking [2]. Whether per-stage arithmetic masking guarantees pipeline-level security had no prior machine-checked answer for the r-bearing case: composition frameworks (ISW, t-SNI, PINI, DOM) were formalized exclusively for Boolean masking over $\mathrm{GF}(2)$; no proof assistant artifact addresses the NTT butterfly over $\mathbb{Z}_q$. We present three machine-checked results in Lean 4 with Mathlib, all zero sorry. First, we close a stated limitation of prior work: value-independence implies constant marginal distribution under fresh randomness (via an algebraic MutualInfoZero proxy). Second, butterfly per-context uniformity: for any Cooley-Tukey butterfly with fresh output mask over $\mathbb{Z}/q\mathbb{Z}$ ($q > 0$), each output wire has exactly one mask value producing each output, a uniform marginal independent of secrets, universal over all moduli, twiddle factors, and inputs. Third, a k-stage NTT pipeline with fresh per-stage masking satisfies per-context uniformity at every stage under the ISW first-order probing model. We document a named warning: pointwise value-independence is false for butterfly outputs. The Adams Bridge accelerator (CHIPS Alliance Caliptra) fails the fresh masking hypothesis, masking active only in INTT round 0, architecturally explaining its structural insecurity. Artifact: nine theorems, 1,738 build jobs, zero sorry. Composition for nonlinear gadgets (Barrett) is addressed in forthcoming manuscripts proving Barrett's PF-PINI(2) satisfaction ('one-bit barrier') [3] and k-stage composition for PF-PINI gadgets under fresh-mask renewal [4].