EECS Publication

High-performance Matrix-matrix Multiplications of Very Small Matrices

Ian Masliah and Ahmad Abdelfattah and Azzam Haidar and Stanimire Tomov and Marc Baboulin and Joel Falcou and Jack Dongarra

The use of the general dense matrix-matrix multiplication (GEMM) is fundamental for obtaining high performance in many scientific computing applications. GEMMs for small matrices (of sizes less than 32) however, are not sufficiently optimized in existing libraries. In this paper we consider the case of many small GEMMs on either CPU or GPU architectures. This is a case that often occurs in applications like big data analytics, machine learning, high-order FEM, and others. The GEMMs are grouped together in a single batched routine. We present specialized for these cases algorithms and optimization techniques to obtain performance that is within 90% of the optimal. We show that these results outperform currently available state-of-the-art implementations and vendor-tuned math libraries.

Published 2016-03-11 05:00:00 as ut-eecs-16-740 (ID:599)

ut-eecs-16-740.pdf

« Back to Listing

The University of Tennessee, Knoxville

Min H. Kao Department of Electrical Engineering & Computer Science

EECS Publication