Performance-efficient Recommendation and Prediction Service for Big Data frameworks focusing on Data Compression and In-memory Data Storage Indicators

Main Article Content

Hrachya Astsatryan
Arthur Lalayan
Aram Kocharyan
Daniel Hagimont

Abstract

The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on metrics, and recommends the best configuration parameters to improve Hadoop and Spark frameworks' performance. Several CPU and data-intensive applications and micro-benchmarks have been evaluated to improve the performance, including Log Analyzer, WordCount, and K-Means.

Article Details

Section
Research Papers
Author Biographies

Hrachya Astsatryan, Institute for Informatics and Automation Problems National Academy of Sciences of Armenia, Armenia

Associate Professor, Head of the Centre for Scientific Computing (http://csc.iiap.sci.am)

Arthur Lalayan, National Polytechnic University of Armenia, Armenia

Arthur Lalayan is currently Ph.D. student in computer science at the National Polytechnic University of Armenia (NPUA). He received his Bachelor’s degree and Master’s in informatics and computer science degree from NPUA in 2019 and 2021, respectively. His research interests include large scale data analytic and optimization.

Aram Kocharyan, Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France

Aram Kocharyan received his Ph.D from Polytechnic National Institute of Toulouse and Institute for Informatics and Automation Problems of the National Academy of Sciences of Armenia in 2019. His main research interests are in Virtualization, Cloud Computing, and Operating Systems.

Daniel Hagimont, Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France

Daniel Hagimont is a Professor at Polytechnic National Institute of Toulouse, France and a member of the IRIT laboratory, where he leads a group working on operating systems, distributed systems and middleware. He received a PhD from Polytechnic National Institute of Grenoble, France in 1993. After a postdoctorate at the University of British Columbia, Vancouver, Canada in 1994, he joined INRIA Grenoble in 1995.