Exact and Heuristic Data Workflow Placement Algorithms for Big Data Computing in Cloud Datacenters

Sonia Ikken; Eric Renault; Abdelkamel Tari; Tahar Kechadi

doi:10.12694/scpe.v19i3.1365

PDF

Published: Sep 14, 2018

DOI: https://doi.org/10.12694/scpe.v19i3.1365

Sonia Ikken

Faculty of Exact Sciences, University of Bejaia, 06000 Bejaia, Algeria, and Telecom SudParis, Samovar-UMR 5157 CNRS, University of Paris-Saclay, France

Eric Renault

Telecom SudParis, Samovar-UMR 5157 CNRS, University of Paris-Saclay, France

Abdelkamel Tari

Faculty of Exact Sciences, University of Bejaia, 06000 Bejaia, Algeria

Tahar Kechadi

UCD School of Computer Science and Informatics, Dublin, Ireland

Abstract

Several big data-driven applications are currently carried out in collaboration using distributed infrastructure. These data-driven applications usually deal with experiments at massive scale. Data generated by such experiments are huge and stored at multiple geographic locations for reuse. Workflow systems, composed of jobs using collaborative task-based models, present new dependency and data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of applications is on time, and resource usage-cost-efficient. In this paper, we present an efficient data placement approach to improve the performance of workflow processing in distributed data centres. The proposed approach involves two types of data: splittable and unsplittable intermediate data. Moreover, we place intermediate data by considering not only their source location but also their dependencies. The main objective is to minimise the total storage cost, including the effort for transferring, storing, and moving that data according to the applications needs. We first propose an exact algorithm which takes into account the intra-job dependencies, and we show that the optimal fractional intermediate data placement problem is NP-hard. To solve the problem of unsplittable intermediate data placement, we propose a greedy heuristic algorithm based on a network flow optimisation framework. The experimental results show that the performance of our approach is very promising. We also show that even with divergent conditions, the cost ratio of the heuristic approach is close to the optimal solution.

Issue

Vol. 19 No. 3 (2018)

Section

Proposal for Special Issue Papers

Article Sidebar

Main Article Content

Abstract

Article Details