Providing GPU Capability to LU and QR within the ScaLAPACK Framework
Peng Du, Stanimire Tomov, and Jack Dongarra
In the field of dense linear matrix computations on distributed memory systems, ScaLAPACK has established its importance over the years with its high performance and scalability. Since the introduction of CUDA based GPGPU computing in 2008, methods to efficiently use such computing power on distributed memory systems equipped with multi-core CPUs, have attracted much attention. In this work we integrate the CUDA computing directly into the ScaLAPACK framework and demonstrate that good speedup could be achieved on routines like LU and QR by carefully managing the GPU-CPU data transfers. The objective is to eventually convert most of the ScaLAPACK routines to support GPU computing so that current application codes that already utilize ScaLAPACK get a 'free' (automatic) speedup when GPUs are present.
Published 2012-09-12 04:00:00 as ut-cs-12-699 (ID:10)