Skip to content Skip to main navigation Report an accessibility issue

EECS Publication

Optimal Checkpointing Period: Time vs. Energy

Guillaume Aupy and Anne Benoit and Thomas Herault and Yves Robert and Jack Dongarra

This short paper deals with parallel scientific applications using non-blocking and periodic coordinated checkpointing to enforce resilience. We provide a model and detailed formulas for total execution time and consumed energy. We characterize the optimal period for both objectives, and we assess the range of time/energy trade-offs to be made by instantiating the model with a set of realistic scenarios for Exascale systems. We give a particular emphasis to I/O transfers, because the relative cost of communication is expected to dramatically increase, both in terms of latency and consumed energy, for future Exascale platforms.

Published  2013-10-14 04:00:00  as  ut-eecs-13-718 (ID:574)

ut-eecs-13-718.pdf

« Back to Listing