Skip to content Skip to main navigation Report an accessibility issue

EECS Publication

Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources

Zizhong Chen and Jack Dongarra

As the desire of scientists to perform ever larger computations drives the size of today's high performance computers from hundreds, to thousands, and even to tens of thousands of processors, nodes failures in these computes are becoming frequently events.

Published  2005-04-20 04:00:00  as  ut-cs-05-561 (ID:163)

ut-cs-05-561.pdf

« Back to Listing