Skip to content Skip to main navigation Report an accessibility issue

EECS Publication

Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modelling

Jack Dongarra, Piotr Luszczek

We present a modelling framework to accurately predict time to run dense linear algebra calculation. We report the framework's accuracy in a number of varied computational environments such as shared memory multi-core systems, clusters, and large supercomputing installations with tens of thousands of cores. We also test the accuracy for various algorithms, each of which having a different scaling properties and tolerance to low-bandwidth/high-latency interconnects. The predictive accuracy is very good and on the order of measurement accuracy which makes the method suitable for both dedicated and non-dedicated environments. We also present a practical application of our model to reduce the time required to tune and optimize large parallel runs whose time is dominated by linear algebra computations. We show practical examples of how to apply the methodology to avoid common pitfalls and reduce the influence of measurement errors and the inherent performance variability.

Published  2010-10-08 04:00:00  as  ut-cs-10-661 (ID:63)

ut-cs-10-661.pdf

« Back to Listing