All Publications
-
On block-asynchronous execution on GPUs
, November 2016, ut-eecs-16-746.pdf -
High Performance Realtime Convex Solver for Embedded Systems
, October 2016, ut-eecs-16-745.pdf -
2016 Dense Linear Algebra Software Packages Survey
, September 2016, ut-eecs-16-744.pdf -
LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi
, June 2016, ut-eecs-16-743.pdf -
Report on the Sunway TaihuLight System
, June 2016, ut-eecs-16-742.pdf -
Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems
, April 2016, ut-eecs-16-741.pdf -
High-performance Matrix-matrix Multiplications of Very Small Matrices
, March 2016, ut-eecs-16-740.pdf -
Performance, Design, and Autotuning of Batched GEMM for GPUs
, February 2016, ut-eecs-16-739.pdf