EECS Publication

Accelerating GPU Kernels for Dense Linear Algebra

Rajib Nath, Stanimire Tomov, and Jack Dongarra

Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are a major building block of dense linear algebra (DLA) libraries, and therefore must be highly optimized. We present some techniques and implementations that significantly accelerate the corresponding routines from currently available libraries for GPUs. In particular, Pointer Redirecting { a set of GPU specific optimization techniques { allows us to easily remove performance oscillations associated with problem dimensions not divisible by fixed blocking sizes. For example, applied to the matrix-matrix multiplication routines, depending on the hardware configuration and routine parameters, this can lead to two times faster algorithms. Similarly, the matrix-vector multiplication can be accelerated more than two times in both single and double precision arithmetic. Additionally, GPU specific acceleration techniques are applied to develop new kernels (e.g. syrk, symv) that are up to 20x faster than those currently available. We present these kernels and also show their acceleration effect to higher level dense linear algebra routines. The accelerated kernels are now freely available through the MAGMA BLAS library

Published 2009-12-18 05:00:00 as ut-cs-09-648 (ID:80)

ut-cs-09-648.pdf

« Back to Listing

The University of Tennessee, Knoxville

Min H. Kao Department of Electrical Engineering & Computer Science

EECS Publication