Recursive vectorized computation of the vector $p$-norm
This work addresses the need for more accurate norm computation in numerical computing, but the improvements are incremental over existing methods.
The paper proposes recursive algorithms for computing the Frobenius norm of a real array, showing they can be significantly more accurate than the BLAS routine DNRM2 in many cases. The algorithms are vectorized with Intel vector instructions for performance comparable to DNRM2 and parallelized with OpenCilk, with some scalar versions being unconditionally bitwise reproducible.
Recursive algorithms for computing the Frobenius norm of a real array are proposed, based on hypot, a hypotenuse function. Comparing their relative accuracy bounds with those of the BLAS routine DNRM2 it is shown that the proposed algorithms could in many cases be significantly more accurate. The scalar recursive algorithms are vectorized with the Intel's vector instructions to achieve performance comparable to DNRM2, and are further parallelized with OpenCilk. Some scalar algorithms are unconditionally bitwise reproducible, while the reproducibility of the vector ones depends on the vector width. A modification of the proposed algorithms to compute the vector $p$-norm is also presented.