Hideharu Amano

DCAug 21, 2014

An Automatic Mixed Software Hardware Pipeline Builder for CPU-FPGA Platforms

Takaaki Miyajima, David Thomas, Hideharu Amano

Our toolchain for accelerating application called Courier-FPGA, is designed for utilize the processing power of CPU-FPGA platforms for software programmers and non-expert users. It automatically gathers runtime information of library functions from a running target binary, and constructs the function call graph including input-output data. Then, it uses corresponding predefined hardware modules if these are ready for FPGA and prepares software functions on CPU by using Pipeline Generator. The Pipeline Generator builds a pipeline control program by using Intel Threading Building Block to run both hardware modules and software functions in parallel. Finally, Courier-FPGA dynamically replaces the original functions in the binary and accelerates it by using the built pipeline. Courier-FPGA performs these acceleration processes without user intervention, source code tweaks or re-compilations of the binary. We describe the technical details of this mixed software hardware pipeline on CPU-FPGA platforms in this paper. In our case study, Courier-FPGA was used to accelerate a corner detection using the Harris-Stephens method application binary on the Zynq platform. A series of functions were off-loaded, and speed up 15.36 times was achieved by using the built pipeline.

QUANT-PHMar 26, 2019

Extracting Success from IBM's 20-Qubit Machines Using Error-Aware Compilation

Shin Nishio, Yulu Pan, Takahiko Satoh et al.

NISQ (Noisy, Intermediate-Scale Quantum) computing requires error mitigation to achieve meaningful computation. Our compilation tool development focuses on the fact that the error rates of individual qubits are not equal, with a goal of maximizing the success probability of real-world subroutines such as an adder circuit. We begin by establishing a metric for choosing among possible paths and circuit alternatives for executing gates between variables placed far apart within the processor, and test our approach on two IBM 20-qubit systems named Tokyo and Poughkeepsie. We find that a single-number metric describing the fidelity of individual gates is a useful but imperfect guide. Our compiler uses this subsystem and maps complete circuits onto the machine using a beam search-based heuristic that will scale as processor and program sizes grow. To evaluate the whole compilation process, we compiled and executed adder circuits, then calculated the KL-divergence (a measure of the distance between two probability distributions). For a circuit within the capabilities of the hardware, our compilation increases estimated success probability and reduces KL-divergence relative to an error-oblivious placement.

Hideharu Amano

2 Papers