Framework for Performance Enhancement of MPI based Application on Cloud
Main Article Content
Abstract
Cloud technology is a major revolution that has happened in the computing era that has changed the way applications and resources are used. Elasticity is the key characteristic of the cloud, wherein the required number of resources are provided as a service with a pay-as-you-go principle. This reduces the huge cost involved in buying, installing, and maintaining the resources. Cloud computing, with its highly scalable resources, can be a good platform for High-Performance Computing (HPC). HPC application performance highly depends on the quantity of the resources, which makes the cloud a suitable candidate. But the HPC community is not very happy with cloud technology, and most of the users still think cloud technology is not suitable for HPC applications. But virtualization technology, which is the foundation of the cloud, degrades the performance of applications in the urge to improve utilization. The hypervisor layer and resource sharing by the virtual machines (VMs) hosted on the same node are the main reasons for performance degradation in the cloud. The majority of the HPC applications belong to the message passing (MPI) category, and for these applications, communication cost is the major stakeholder in deciding performance. If these applications are hosted on the cloud, it leads to further performance degradation as the process on a VM communicates through the virtual network interface, which in turn shares the network interface of the host machine with other VMs. MPI-based applications hosted on MPPs work in a bandwidth shared environment as multiple processes communicate over the same network. But in the cloud, as the number of VMs increases, per node bandwidth availability per communication reduces. To address the above issues, we have built a framework to enhance the performance of MPI-based HPC applications on VMs by considering proper VM placement strategy and resource reservation policies, with knowledge of resource availability and process communication patterns. A VM placement strategy for dynamic clustering of VMs with high priority for shared memory-based communication is proposed and tested. Results show that with a medium number of processes, there is an improvement of around 70% with our placement strategy for high data communicating processes. If there are fewer processes, and the single physical node can hold all the VMs, then the performance improvement is up to 500%.