Communication-aware Approaches for Transparent Checkpointing in Cloud Computing

Samy
 Sadi; Belabbas
 Yagoubi

doi:10.12694/scpe.v17i3.1184

Authors

Samy Sadi
Belabbas Yagoubi

DOI:

https://doi.org/10.12694/scpe.v17i3.1184

Abstract

Checkpoint/Restart or checkpointing is a fault tolerance technique which consists on taking frequent snapshots of an application, so that, in the event of a failure, the application's state can be restored and the application's execution continued without necessarily restarting it. The advent of Cloud Computing brought new challenges with regard to this technique as Fault Tolerance needs to be supplied transparently in environments running highly heterogeneous applications. In this context, we propose two new fully transparent checkpointing approaches. Both approaches use communication-induced checkpointing and guarantee a consistent view of the applications with regard to the outside world process. The first approach is uncoordinated and creates checkpoints for applications independently. The second approach is coordinated, and applications are first grouped into clusters before the checkpointing process is started. We have compared the proposed approaches with state of the art approaches. The results show that our approaches perform better when considering the communication latencies, and the overhead on the execution of the Virtual Machines.

Communication-aware Approaches for Transparent Checkpointing in Cloud Computing

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

announcement

Indexed In

SUBMIT

Metrics

Journal Information