Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system
Main Article Content
Abstract
Lower/Upper triangular (LU) factorization plays an important role in
scientific and high performance computing. This paper presents an
implementation of the LU decomposition algorithm for double
precision complex numbers on a star topology based multi-FPGA
platform. The out of core implementation moves data through multiple
levels of a hierarchical memory system (hard disk, DDR SDRAMs and
FPGA block RAMS) using completely pipelined data paths in all steps
of the algorithm. Detailed performance numbers for all phases of
the algorithm are presented and compared to a highly optimized
implementation for a low power microprocessor based system. We also
compare the performance/Watt for the FPGA and the microprocessor
system. Finally, recommendations will be given on how improvements
of the FPGA design would increase the performance of the double
precision complex LU factorization on the FPGA based system.
scientific and high performance computing. This paper presents an
implementation of the LU decomposition algorithm for double
precision complex numbers on a star topology based multi-FPGA
platform. The out of core implementation moves data through multiple
levels of a hierarchical memory system (hard disk, DDR SDRAMs and
FPGA block RAMS) using completely pipelined data paths in all steps
of the algorithm. Detailed performance numbers for all phases of
the algorithm are presented and compared to a highly optimized
implementation for a low power microprocessor based system. We also
compare the performance/Watt for the FPGA and the microprocessor
system. Finally, recommendations will be given on how improvements
of the FPGA design would increase the performance of the double
precision complex LU factorization on the FPGA based system.
Article Details
Issue
Section
Proposal for Special Issue Papers