Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures
Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, and Jack Dongarra
To exploit the potential of multi-core architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a fully asynchronous method for computing a QR factorization on shared-memory multi-core architectures that overcomes this bottleneck. Our contribution is to adapt an existing algorithm that performs a panel factorization in parallel (named CommunicationAvoiding QR and initially designed for distributed-memory machines), to the context of tile algorithms using asynchronous computations. An experimental study shows significant improvement (up to almost 10 times faster) compared to state-of-the-art approaches. We aim to eventually incorporate this work into the Parallel Linear Algebra for Scalable Multicore Architectures (PLASMA) library.
Published 2009-09-04 04:00:00 as ut-cs-09-645 (ID:77)