A new decentralized periodic replication strategy for dynamic data grids
Main Article Content
Abstract
Data grids provide scalable infrastructure for storage resource and
data files management, which support data-intensive applications that need to access to huge amount of data stored at distributed
locations around the world. The size of these data can reach the scale of terabytes or even petabytes in many applications. These
applications require reaching several main goals, namely efficient accessing, storing, transferring and analyzing a large amount of data in geographically distributed locations. In this situation, replication is a general and simple technique used in data grids to achieve these goals. Indeed, it has as main purposes improving data access efficiency, providing high availability, decreasing bandwidth consumption, improving fault tolerance and enhancing scalability. In this paper, we propose a new classification of replication strategies through two complementary criteria as well as a survey of the induced categories of strategies. In addition, we introduce a new decentralized periodic replication strategy for dynamic data grids assuming limited storage for replicas, called \textsc{DPRSKP}, which stands for Decentralized Periodic Replication Strategy based on Knapsack Problem. This strategy takes into consideration the changing availability of sites. \textsc{DPRSKP} is based on two polynomial-time complexity algorithms. The first one starts by selecting the best candidate files for replication while the second places them in the best locations. The replication problem in \textsc{DPRSKP} is formulated according to the Knapsack problem. In addition, DPRSKP extends the well known LRU and LFU strategies. The simulation experiments
were carried out using OptorSim and a dynamic period rather than a static one. The obtained results show that \textsc{DPRSKP} can effectively improve response time, bandwidth consumption, remote file accesses number and local file accesses number as compared with other replication strategies.
data files management, which support data-intensive applications that need to access to huge amount of data stored at distributed
locations around the world. The size of these data can reach the scale of terabytes or even petabytes in many applications. These
applications require reaching several main goals, namely efficient accessing, storing, transferring and analyzing a large amount of data in geographically distributed locations. In this situation, replication is a general and simple technique used in data grids to achieve these goals. Indeed, it has as main purposes improving data access efficiency, providing high availability, decreasing bandwidth consumption, improving fault tolerance and enhancing scalability. In this paper, we propose a new classification of replication strategies through two complementary criteria as well as a survey of the induced categories of strategies. In addition, we introduce a new decentralized periodic replication strategy for dynamic data grids assuming limited storage for replicas, called \textsc{DPRSKP}, which stands for Decentralized Periodic Replication Strategy based on Knapsack Problem. This strategy takes into consideration the changing availability of sites. \textsc{DPRSKP} is based on two polynomial-time complexity algorithms. The first one starts by selecting the best candidate files for replication while the second places them in the best locations. The replication problem in \textsc{DPRSKP} is formulated according to the Knapsack problem. In addition, DPRSKP extends the well known LRU and LFU strategies. The simulation experiments
were carried out using OptorSim and a dynamic period rather than a static one. The obtained results show that \textsc{DPRSKP} can effectively improve response time, bandwidth consumption, remote file accesses number and local file accesses number as compared with other replication strategies.
Article Details
Issue
Section
Research Papers