Optimizing Hadoop Data Locality: Performance Enhancement Strategies in Heterogeneous Computing Environments

Main Article Content

Si-Yeong Kim
Tai-Hoon Kim

Abstract

As organizations increasingly harness big data for analytics and decision-making, the efficient processing of massive datasets becomes paramount. Hadoop, a widely adopted distributed computing framework, excels in processing large-scale data. However, its performance is contingent on effective data locality, which becomes challenging in heterogeneous computing environments comprising diverse hardware resources. This research addresses the imperative of enhancing Hadoop’s data locality performance in heterogeneous computing environments. The study explores strategies to optimize data placement and task scheduling, considering the diverse characteristics of nodes within the infrastructure. Through a comprehensive analysis of Hadoop’s data locality algorithms and their impact on performance, this work proposes novel approaches to mitigate challenges associated with disparate hardware capabilities. Weighted Extreme Learning Machine Technique (Weighted ELM) with the Firefly Algorithm (WELM-FF) is used in the proposed work. The integration of Weighted Extreme Learning Machine (WELM) with the Firefly Algorithm holds promise for enhancing machine learning models in the context of large-scale data processing. The research employs a combination of theoretical analysis and practical experiments to evaluate the effectiveness of the proposed enhancements. Factors such as network latency, disk I/O, and CPU capabilities are taken into account to develop a holistic framework for improving data locality and, consequently, overall Hadoop performance. The findings presented in this study contribute valuable insights to the field of distributed computing, offering practical recommendations for organizations seeking to maximize the efficiency of their Hadoop deployments in heterogeneous computing environments. By addressing the intricacies of data locality, this research strives to enhance the scalability and performance of Hadoop clusters, thereby facilitating more effective utilization of big data resources.

Article Details

Section
Special Issue - Scalable Dew Computing for future generation IoT systems