Performance-efficient Recommendation and Prediction Service for Big Data frameworks focusing on Data Compression and In-memory Data Storage Indicators

Hrachya Astsatryan; Arthur Lalayan; Aram Kocharyan; Daniel Hagimont

doi:10.12694/scpe.v22i4.1945

PDF

Published: Nov 26, 2021

DOI: https://doi.org/10.12694/scpe.v22i4.1945

Keywords:

Hadoop; Spark; MapReduce; data compression; in-memory file system

Hrachya Astsatryan

Institute for Informatics and Automation Problems National Academy of Sciences of Armenia, Armenia

Arthur Lalayan

National Polytechnic University of Armenia, Armenia

Aram Kocharyan

Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France

Daniel Hagimont

Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France

Abstract

The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on metrics, and recommends the best configuration parameters to improve Hadoop and Spark frameworks' performance. Several CPU and data-intensive applications and micro-benchmarks have been evaluated to improve the performance, including Log Analyzer, WordCount, and K-Means.

Issue

Vol. 22 No. 4 (2021)

Section

Research Papers

Author Biographies

Hrachya Astsatryan, Institute for Informatics and Automation Problems National Academy of Sciences of Armenia, Armenia

Associate Professor, Head of the Centre for Scientific Computing (http://csc.iiap.sci.am)

Arthur Lalayan, National Polytechnic University of Armenia, Armenia

Arthur Lalayan is currently Ph.D. student in computer science at the National Polytechnic University of Armenia (NPUA). He received his Bachelor’s degree and Master’s in informatics and computer science degree from NPUA in 2019 and 2021, respectively. His research interests include large scale data analytic and optimization.

Aram Kocharyan, Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France

Aram Kocharyan received his Ph.D from Polytechnic National Institute of Toulouse and Institute for Informatics and Automation Problems of the National Academy of Sciences of Armenia in 2019. His main research interests are in Virtualization, Cloud Computing, and Operating Systems.

Daniel Hagimont, Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France

Daniel Hagimont is a Professor at Polytechnic National Institute of Toulouse, France and a member of the IRIT laboratory, where he leads a group working on operating systems, distributed systems and middleware. He received a PhD from Polytechnic National Institute of Grenoble, France in 1993. After a postdoctorate at the University of British Columbia, Vancouver, Canada in 1994, he joined INRIA Grenoble in 1995.

Article Sidebar

Main Article Content

Abstract

Article Details

Hrachya Astsatryan, Institute for Informatics and Automation Problems National Academy of Sciences of Armenia, Armenia

Arthur Lalayan, National Polytechnic University of Armenia, Armenia

Aram Kocharyan, Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France

Daniel Hagimont, Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France