Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures
Hatem Ltaief, Jakub Kurzak, and Jack Dongarra
The objective of this paper is to extend, in the context of multi-core architectures, the concepts of algorithms-by-tiles [Buttari et al., 2007] for Cholesky, LU, QR factorizations to the family of twosided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a pre-processing step for calculating the singular value decomposition. Furthermore, in the last Top500 list from June 2008, 98% of the fastest parallel systems in the world were based on multicores. The manycore trend has increasingly exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of algorithms-by-tiles in the multi-core environment (i.e., high level of parallelism with fine granularity and high performance data representation combined with a dynamic data driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12000 xx 12000 matrix with 16 Intel Tigerton 2:4 GHz processors.
Published 2008-10-01 04:00:00 as ut-cs-08-631 (ID:106)