Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system

Main Article Content

T. Hauser
A. Dasu
A. Sudarsanam
S. Young

Abstract

Lower/Upper triangular (LU) factorization plays an important role in
scientific and high performance computing. This paper presents an
implementation of the LU decomposition algorithm for double
precision complex numbers on a star topology based multi-FPGA
platform. The out of core implementation moves data through multiple
levels of a hierarchical memory system (hard disk, DDR SDRAMs and
FPGA block RAMS) using completely pipelined data paths in all steps
of the algorithm. Detailed performance numbers for all phases of
the algorithm are presented and compared to a highly optimized
implementation for a low power microprocessor based system. We also
compare the performance/Watt for the FPGA and the microprocessor
system. Finally, recommendations will be given on how improvements
of the FPGA design would increase the performance of the double
precision complex LU factorization on the FPGA based system.

Article Details

Section
Proposal for Special Issue Papers