DSApr 21, 2010
Some linear-time algorithms for systolic arraysRichard P. Brent, Franklin T. Luk, H. T. Kung · harvard
We survey some results on linear-time algorithms for systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree n over a finite field can be computed in time O(n) on a linear systolic array of O(n) cells; similarly for the GCD of two n-bit binary numbers. We show how n * n Toeplitz systems of linear equations can be solved in time O(n) on a linear array of O(n) cells, each of which has constant memory size (independent of n). Finally, we outline how a two-dimensional square array of O(n)* O(n) cells can be used to solve (to working accuracy) the eigenvalue problem for a symmetric real n* n matrix in time O(nS(n)). Here S(n) is a slowly growing function of n; for practical purposes S(n) can be regarded as a constant. In addition to their theoretical interest, these results have potential applications in the areas of error-correcting codes, symbolic and algebraic computations, signal processing and image processing.
NAMay 30, 2010
Multiple-precision zero-finding methods and the complexity of elementary function evaluationRichard P. Brent
We consider methods for finding high-precision approximations to simple zeros of smooth functions. As an application, we give fast methods for evaluating the elementary functions log(x), exp(x), sin(x) etc. to high precision. For example, if x is a positive floating-point number with an n-bit fraction, then (under rather weak assumptions) an n-bit approximation to log(x) or exp(x) may be computed in time asymptotically equal to 13M(n)lg(n), where M(n) is the time required to multiply floating-point numbers with n-bit fractions. Similar results are given for the other elementary functions. Some analogies with operations on formal power series (over a field of characteristic zero) are discussed. In particular, it is possible to compute the first n terms in log(1 + a_1.x + ...) or exp(a_1.x + ...) in time O(M(n)), where M(n) is the time required to multiply two polynomials of degree n - 1. It follows that the first n terms in a q-th power (1 + a_1.x + ...)^q can be computed in time O(M(n)), independent of q. One of the results of this paper is the "Gauss-Legendre" or "Brent-Salamin" algorithm for computing pi. This is the first quadratically convergent algorithm for pi. It was also published in Brent [J. ACM 23 (1976), 242-251], and independently by Salamin [Math. Comp. 30 (1976), 565-570].
NAApr 20, 2010
On the precision attainable with various floating-point number systemsRichard P. Brent
For scientific computations on a digital computer the set of real number is usually approximated by a finite set F of "floating-point" numbers. We compare the numerical accuracy possible with difference choices of F having approximately the same range and requiring the same word length. In particular, we compare different choices of base (or radix) in the usual floating-point systems. The emphasis is on the choice of F, not on the details of the number representation or the arithmetic, but both rounded and truncated arithmetic are considered. Theoretical results are given, and some simulations of typical floating-point computations (forming sums, solving systems of linear equations, finding eigenvalues) are described. If the leading fraction bit of a normalized base 2 number is not stored explicitly (saving a bit), and the criterion is to minimize the mean square roundoff error, then base 2 is best. If unnormalized numbers are allowed, so the first bit must be stored explicitly, then base 4 (or sometimes base 8) is the best of the usual systems.
MSApr 20, 2010
MP users guideRichard P. Brent
MP is a package of ANSI Standard Fortran (ANS X3.9-1966) subroutines for performing multiple-precision floating-point arithmetic and evaluating elementary and special functions. The subroutines are machine independent and the precision is arbitrary, subject to storage limitations. The User's Guide describes the routines and their calling sequences, example and test programs, use of the Augment precompiler, and gives installation instructions for the package.
DSApr 20, 2010
Fast normal random number generators on vector processorsRichard P. Brent
We consider pseudo-random number generators suitable for vector processors. In particular, we describe vectorised implementations of the Box-Muller and Polar methods, and show that they give good performance on the Fujitsu VP2200. We also consider some other popular methods, e.g. the Ratio method of Kinderman and Monahan (1977) (as improved by Leva (1992)), and the method of Von Neumann and Forsythe, and show why they are unlikely to be competitive with the Polar method on vector processors.
NAApr 30, 2010
An asymptotic expansion inspired by RamanujanRichard P. Brent
Corollary 2, Entry 9, Chapter 4 of Ramanujan's first notebook claims that a certain sum is asymptotic to ln(x) + gamma, where x is a real variable in the sum and gamma is Euler's constant. Ramanujan's claim is known to be correct for the case n = 1, but incorrect for n > 2 (here n is an integer parameter in the sum). We show that the result is correct for n = 2. We also consider the order of the error term, and discuss a different, correct generalisation of the case n = 1.
NTJul 27, 2012
On the sign of the real part of the Riemann zeta-functionJuan Arias de Reyna, Richard P. Brent, Jan van de Lune
We consider the distribution of $\argζ(σ+it)$ on fixed lines $σ> \frac12$, and in particular the density \[d(σ) = \lim_{T \rightarrow +\infty} \frac{1}{2T} |\{t \in [-T,+T]: |\argζ(σ+it)| > π/2\}|\,,\] and the closely related density \[d_{-}(σ) = \lim_{T \rightarrow +\infty} \frac{1}{2T} |\{t \in [-T,+T]: \Reζ(σ+it) < 0\}|\,.\] Using classical results of Bohr and Jessen, we obtain an explicit expression for the characteristic function $ψ_σ(x)$ associated with $\argζ(σ+it)$. We give explicit expressions for $d(σ)$ and $d_{-}(σ)$ in terms of $ψ_σ(x)$. Finally, we give a practical algorithm for evaluating these expressions to obtain accurate numerical values of $d(σ)$ and $d_{-}(σ)$.
NAMay 7, 2010
George Forsythe's last paperRichard P. Brent
We describe von Neumann's elegant idea for sampling from the exponential distribution, Forsythe's generalization for sampling from a probability distribution whose density has the form exp(-G(x)), where G(x) is easy to compute (e.g. a polynomial), and my refinement of these ideas to give an efficient algorithm for generating pseudo-random numbers with a normal distribution. Later developments are also mentioned.
DSApr 19, 2010
A fast vectorised implementation of Wallace's normal random number generatorRichard P. Brent
Wallace has proposed a new class of pseudo-random generators for normal variates. These generators do not require a stream of uniform pseudo-random numbers, except for initialisation. The inner loops are essentially matrix-vector multiplications and are very suitable for implementation on vector processors or vector/parallel processors such as the Fujitsu VPP300. In this report we outline Wallace's idea, consider some variations on it, and describe a vectorised implementation RANN4 which is more than three times faster than its best competitors (the Polar and Box-Muller methods) on the Fujitsu VP2200 and VPP300.
NAOct 6, 2016
On asymptotic approximations to the log-Gamma and Riemann-Siegel theta functionsRichard P. Brent
We give bounds on the error in the asymptotic approximation of the log-Gamma function $\lnΓ(z)$ for complex $z$ in the right half-plane. These improve on earlier bounds by Behnke and Sommer (1962), Spira (1971), and Hare (1997). We show that $|R_{k+1}(z)/T_k(z)| < \sqrt{πk}$ for nonzero $z$ in the right half-plane, where $T_k(z)$ is the $k$-th term in the asymptotic series, and $R_{k+1}(z)$ is the error incurred in truncating the series after $k$ terms. If $k \le |z|$, then the stronger bound $|R_{k+1}(z)/T_k(z)| < (k/|z|)^2/(π^2-1) < 0.113$ holds. Similarly for the asymptotic approximation of $\lnΓ(z+\frac{1}{2})$, except that a factor $η_k = 1/(1-2^{1-2k})$ multiplies some of the bounds. We deduce similar bounds for asymptotic approximation of the Riemann-Siegel theta function $\vartheta(t)$. We show that the accuracy of a well-known approximation to $\vartheta(t)$ can be improved by including an exponentially small term in the approximation. This improves the attainable accuracy for real $t>0$ from $O(\exp(-πt))$ to $O(\exp(-2πt))$. We discuss a similar example due to Olver (1964), and a connection with the Stokes phenomenon.