Skip to content Skip to main navigation Report an accessibility issue

EECS Publication

Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

Fengguang Song, Stanimire Tomov, Jack Dongarra

We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multi-core and multi-GPU system to support matrix computations efficiently. Our approach is able to achieve four objectives: a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system as a distributed-memory machine, and to use a heterogeneous 1-D block cyclic distribution to allocate data to the host system and GPUs to minimize communication. We have developed heterogeneous rectangular-tile algorithms with two different tile sizes (one for CPU cores and the other for GPUs) to cope with processor heterogeneity. We also propose an auto-tuning method to determine the best tile sizes to attain both high performance and load balancing. We have implemented a new runtime system andapplied it to the rectangular tile Cholesky and QR factorizations. Our experiments on a compute node with two Intel Westmere hexa-core CPUs and three Nvidia Fermi GPUs demonstrate the weak scalability, strong scalability, load balance, and efficiency of our approach.

Published  2011-03-28 04:00:00  as  ut-cs-11-668 (ID:30)

ut-cs-11-668.pdf

« Back to Listing