Zvonimir Bujanović

3papers

64citations

Novelty38%

AI Score20

Ranked #193,822 of 205,806 authors (top 94%)#2,245 in NA (top 67%)

3 Papers

NAOct 9, 2016

RADI: A low-rank ADI-type algorithm for large scale algebraic Riccati equations

Peter Benner, Zvonimir Bujanović, Patrick Kürschner et al.

This paper introduces a new algorithm for solving large-scale continuous-time algebraic Riccati equations (CARE). The advantage of the new algorithm is in its immediate and efficient low-rank formulation, which is a generalization of the Cholesky-factored variant of the Lyapunov ADI method. We discuss important implementation aspects of the algorithm, such as reducing the use of complex arithmetic and shift selection strategies. We show that there is a very tight relation between the new algorithm and three other algorithms for CARE previously known in the literature -- all of these seemingly different methods in fact produce exactly the same iterates when used with the same parameters: they are algorithmically different descriptions of the same approximation sequence to the Riccati solution.

NAMay 29, 2018

A Householder-based algorithm for Hessenberg-triangular reduction

Zvonimir Bujanović, Lars Karlsson, Daniel Kressner

The QZ algorithm for computing eigenvalues and eigenvectors of a matrix pencil $A - λB$ requires that the matrices first be reduced to Hessenberg-triangular (HT) form. The current method of choice for HT reduction relies entirely on Givens rotations regrouped and accumulated into small dense matrices which are subsequently applied using matrix multiplication routines. A non-vanishing fraction of the total flop count must nevertheless still be performed as sequences of overlapping Givens rotations alternately applied from the left and from the right. The many data dependencies associated with this computational pattern leads to inefficient use of the processor and poor scalability. In this paper, we therefore introduce a fundamentally different approach that relies entirely on (large) Householder reflectors partially accumulated into block reflectors, by using (compact) WY representations. Even though the new algorithm requires more floating point operations than the state of the art algorithm, extensive experiments on both real and synthetic data indicate that it is still competitive, even in a sequential setting. The new algorithm is conjectured to have better parallel scalability, an idea which is partially supported by early small-scale experiments using multi-threaded BLAS. The design and evaluation of a parallel formulation is future work.

MSAug 21, 2017

Parallel solver for shifted systems in a hybrid CPU-GPU framework

Nela Bosner, Zvonimir Bujanović, Zlatko Drmač

This paper proposes a combination of a hybrid CPU--GPU and a pure GPU software implementation of a direct algorithm for solving shifted linear systems $(A - σI)X = B$ with large number of complex shifts $σ$ and multiple right-hand sides. Such problems often appear e.g. in control theory when evaluating the transfer function, or as a part of an algorithm performing interpolatory model reduction, as well as when computing pseudospectra and structured pseudospectra, or solving large linear systems of ordinary differential equations. The proposed algorithm first jointly reduces the general full $n\times n$ matrix $A$ and the $n\times m$ full right-hand side matrix $B$ to the controller Hessenberg canonical form that facilitates efficient solution: $A$ is transformed to a so-called $m$-Hessenberg form and $B$ is made upper-triangular. This is implemented as blocked highly parallel CPU--GPU hybrid algorithm; individual blocks are reduced by the CPU, and the necessary updates of the rest of the matrix are split among the cores of the CPU and the GPU. To enhance parallelization, the reduction and the updates are overlapped. In the next phase, the reduced $m$-Hessenberg--triangular systems are solved entirely on the GPU, with shifts divided into batches. The benefits of such load distribution are demonstrated by numerical experiments. In particular, we show that our proposed implementation provides an excellent basis for efficient implementations of computational methods in systems and control theory, from evaluation of transfer function to the interpolatory model reduction.