EECS Publication
Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems
Fengguang Song, Asim YarKhan, and Jack Dongarra
Multicore systems have increasingly gained importance in both shared-memory and distributed-memory environments. This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multi-core systems (either shared- or distributed-memory). We use a task-based library to replace the existing linear algebra subroutines such as PBLAS to transparently provide the same interface and compute function as the ScaLAPACK library. Linear algebra programs are written with the task-based library and executed by a dynamic runtime system. We mainly focus our runtime system design on the performance scalability metric. We propose an algorithm to solve data dependences without process cooperation in a distributed way. We have implemented the runtime system and applied it to three linear algebra algorithms: Cholesky factorization, LU factorization, and QR factorization. Our experiments on both shared-memory machines (16-core Intel Tigerton, 32- core IBM Power6) and distributed-memory machines (e.g., Cray XT4 using 1024 cores) demonstrate that our runtime system is able to achieve good scalability. Furthermore, we provide analytical analysis to show why the tiled algorithms are scalable and the expected execution time.
Published 2009-04-13 04:00:00 as ut-cs-09-638 (ID:70)