Skip to content Skip to main navigation Report an accessibility issue

EECS Publication

Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures

Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, and Jack Dongarra

To exploit the potential of multi-core architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a fully asynchronous method for computing a QR factorization on shared-memory multi-core architectures that overcomes this bottleneck. Our contribution is to adapt an existing algorithm that performs a panel factorization in parallel (named CommunicationAvoiding QR and initially designed for distributed-memory machines), to the context of tile algorithms using asynchronous computations. An experimental study shows significant improvement (up to almost 10 times faster) compared to state-of-the-art approaches. We aim to eventually incorporate this work into the Parallel Linear Algebra for Scalable Multicore Architectures (PLASMA) library.

Published  2009-09-04 04:00:00  as  ut-cs-09-645 (ID:77)

ut-cs-09-645.pdf

« Back to Listing