Skip to content Skip to main navigation Report an accessibility issue

EECS Publication

Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems

Fengguang Song, Hatem Ltaief, Bilel Hadri, and Jack Dongarra

As tile linear algebra algorithms continue achieving high performance on shared-memory multi-core architectures, it is a challenging task to make them scalable on distributed-memory multi-core cluster machines. The main contribution of this paper is the extension to the distributed-memory environment of the previous work done by Hadri et al. on Communication-Avoiding QR (CA-QR) factorizations using tile algorithms for tall and skinny matrices (initially done on shared-memory multi-core systems). The fine granularity of tile algorithms associated with communication-avoiding techniques for the QR factorization presents a high degree of parallelism where multiple tasks can be concurrently executed and computation steps fully pipelined. A decentralized dynamic scheduler has then been integrated as a runtime system to efficiently schedule tasks across the distributed resources. Our experimental results performed on two Beowulf clusters (with dual-core and 8-core nodes, respectively) and a Cray XT5 system with 12-core nodes show that the tile CAQR factorization is able to outperform the de facto ScaLAPACK library by up to 4 times for tall and skinny matrices, and has good scalability on up to 3,072 cores.

Published  2010-04-15 04:00:00  as  ut-cs-10-653 (ID:55)

ut-cs-10-653.pdf

« Back to Listing