NwsAlarm: A Tool for Accurately Detecting Resource Performance Degradation
C. Krintz and R. Wolski
End-users of high-performance computing resources have come to expect that consistent levels of performance be delivered to their applications. The advancement of the ComputationalGrid enables the seamless use of a multitude of computing resources by these users. The combination of these developments has generated a need for users to monitor the end-to-end performance available to an application. In addition, when performance degrades, users should be alerted so that dynamic resource selection decisions can be adjusted as necessary. With this work, we present the NwsAlarm, a Java-based utility that enables users to monitor performance levels of any resource being monitored by the Network Weather Service. The NwsAlarm requires no special privileges for acquisition of this information and only that a user click on a web-page link for invocation. More importantly, the NwsAlarm allows administrators (or any user of the NwsAlarm) to register and set expected performance levels. When performance levels fall below these thresholds, administrators are immediately notified via email. The NwsAlarm uses prediction of performance measurements to filter false alarm values. We exemplify the importance of and accuracy achieved by the NwsAlarm with real examples of performance degradation caused by routing table changes and loss of service on the Abilene, Internet-2 research network used for experimentation with evolving Grid software technology. On average, 92% fewer false alarms are raised by the NwsAlarm than if raw measurements are used.
Published 2000-11-01 05:00:00 as ut-cs-00-452 (ID:275)