Quality of Parallel and Distributed Programs and Systems
Main Article Content
Abstract
The field of parallel computing dates back to the mid 50ies, where research labs started the development of so called supercomputers with the aim to significantly increase the performance, mainly the number of (floating point) operations a machine is able to perform per unit of time. Since then, significant advances in hardware and software technology have brought us to a point where the long-time challenge of Tera-Flop computing has been reached in 1998. While increase in performance is still a driving factor in parallel and distributed processing, there are nowadays many other challenges to be addressed in the field. Enabled by growth of the internet, the majority of desktop computers nowadays can be seen as part of a huge distributed system, the world wide web. Advances in wireless networks extend the scope to a variety of mobile devices (including notebooks, PDAs, or mobile phones). Information is therefore distributed by nature, users require any-time any-place access to information sources, to computing power, and to communication facilities. While performance in the sense defined above is still an important criterion in such kind of systems, other issues, including correctness, reliability, security, ease of use, ubiquitous access, intelligent services, etc. must be considered already in the development process itself. We want to refer to this extended notion of performance covering all those aspects using the term quality of parallel and distributed programs and systems.
In this special issue, a selection of papers is published which have presented at the Austrian-Hungarian Workshop on Distributed and Parallel Systems organized jointly by the Austrian Computer Society and the MTA SZTAKI Computer and Automation Research Institute. Traditionally, DAPSYS (Austrian-Hungarian Workshop on Distributed and Parallel Systems) has been organised every second year since 1996. This event became more and more international and the 3rd one in the September of 2000 was organised together with the EuroPVM/MPI conference at Balatonf�red. The two events attracted more than 120 scientists from all over the world and 10 distinguished invited speakers gave excellent presentations on many aspects of new trends in parallel and distributed computing including cluster and grid computing. All these talks were shared among the participants of both events while two independent proceedings were produced for the two joint events. Springer published the proceedings of EuroPVM/MPI and Kluwer published the proceedings of DAPSYS. In 2002, DAPSYS will move for the first time to Austria and will be held at the University of Linz.
Another tradition of DAPSYS is that some of the best papers belonging to a particular research field are selected for publishing in a prestigious international journal as a special issue. Best papers of the 1st DAPSYS were published at Parallel Computing (Vol. 22, No. 13) and Future Generation Computer Systems Vol. 16, No. 6 was devoted to the 2nd DAPSYS workshop. The current collection of papers represent a selection from the 3rd DAPSYS workshop in the field of quality of parallel and distributed programs and systems.
In order to examine and guarantee quality of parallel and distributed programs and systems special models, metrics and tools are necessary. The six papers selected for this issue are tackling various aspects of these problems.
The first paper by J. Kovács describes a distributed debugger, called DIWIDE, which can be used on both Windows NT and Unix platforms. DIWIDE has been designed to debug message passing parallel/distributed applications on supercomputers and clusters. It can be used as a standalone tool or as integrated part of the P-GRADE graphical parallel programming environment. In the both cases it provides step-by-step debugging facilities in every process of the parallel application. In the latter case it is able to support icon-by-icon graphical debugging in every process and macrostep-by-macrostep debugging at the whole application level. In such case it uses a novel breakpoint approach, called collective breakpoints and diwide automatically places these collective breakpoints at the boundaries of macrosteps in each process. It also provides a special Macrostep Control Panel to support automatic parallel debugging which includes the automatic discovery of deadlock situations in a message passing parallel program. The paper describes all these functionalities of DIWIDE and gives an insight to its distributed implementation, too.
The second paper (Malony and Shende) introduces the TAU performance system framework which incorporates an instrumentation model, a performance measurement model, an execution model, a data analysis model, a presentation model and an integration model. With all these models TAU makes it possible to observe, analyze, and understand the performance of complex execution environments and complex software environments. TAU can support the performance analysis of hierarchical execution architectures where both message passing, shared memory and thread concepts are applied at different levels of the hierarchy. After describing the TAU concept, the paper demonstrates the usage of TAU by performance case studies from MPI, multi-threading, mixed-mode parallelism and task/data parallelism.
The third paper by Podhorszki has many similarities with the second one. It also describes a performance monitoring system, called GRM, which is an integrated part of the previously mentioned P-GRADE parallel programming environment. GRM was constructed with the following goals in mind: To support monitoring and visualising parallel programs at GRAPNEL level; To be part of an integrated development environment. To be portable and usable on heterogeneous clusters under UNIX operating systems (Irix, Solaris, Linux, DEC-Alpha, etc.); To support the performance evaluation and debugging of large, long-running programs with execution visualisation; To support the collection of both statistics and event trace; Trace data should not be lost at program abortion. The execution to the point of abortion should be visualised; The execution of the application and the development environment should be separate, i.e., an application can be developed on a local host while it is executed on a remote cluster (and visualised semi-on-line on the local host). The paper explains in detail how all these features are implemented in GRM.
In the fourth paper, authored by Brim, Flanery, geist, Luethke, and Scott, a suite of tools developed at ORNL is presented for use in administering high performance cluster computing resources. The motivation for the work was to provide a vendor-indepent solution to be made available to the cluster computing community to reduce the total cost of cluster ownership. In contrast to other administration tools, which are based on the one file system approach, the cluster command and control (C3) suite is based on a fully decentralised approach to promote scalability. C3 tools could be used in a command line mode, but can also be integrated into other programs for both, cluster administration pruposes and user application purposes.
The fifth paper can be considered as a case study for performance measurement presented by Zoltani, Satya-narayana, and Hisley from the HPC division at the US Army research lab. Six applications benchmarks, including four NAS codes were parallelized using OpenPM and MPI and they were run on a 128-processor Origin 2000 supercomputer. The authors collected detailed profile data to understand the factors causing imperfect scalability. Their results show that load imbalance and cost of remote accesses are the main factors in limited speed-up of the OpenMP versions, whilc communication costs are the single major factor in the performance degradation the MPI versions.
The last paper can be considered as an outlook to future performance issues raised by the Internet, Web, e-commerce, etc. The authors, Haring, Hlavacs and Hummel, claim that conventional performance metrics like response time and throughput are not sufficient to characterize the performance of Internet-based distributed systems. Their paper describes several workload models (renewal models, Markov models, linear stohastic models, self-similar traffic models, user behavior models) to tackle the problem. The authors take into consideration both wired and wireless networks in the paper. First they deal with performance issues and open problems of Web servers, then they discuss problems of managing the increasing Internet traffic at the network layer. They investigate open problems and possible solutions concerning multicast traffic and servers. They also consider the user's view, addressing new ways for hiding the complexity of distributed systems and permitting transparent execution and data localization. Finally, they discuss open problems of wireless mobile networks and how to cope with intrinsic problems like limited radio and energy resources.
This collection of papers forms a representative set of issues concerning correctness and performance quality issues in parallel and distributed systems. They contain both theoretical and practical approaches to solve the problems, and also present practical tools and environments for the analysis and measurements of supercomputers and clusters.
We would like to thank the programme committee of DAPSYS and all the referees for their valuable work and comments. We want to acknowledge the assisstance and support of Marcin Paprzycki, as the editor-in-chief of the journal, without whom this issue would not have appeared. Finally, we are grateful to all the Austrian (OEAAD) and Hungarian (Ministry of Education, Foundation for the Technological Progress of the Industry) authorities and companies (Compaq Hungary, Sun Microsystems Hungary, IBM Hungary and Silicon Computers Ltd.) which supported and sponsored us in organizing the 3rd DAPSYS event.
Péter Kacsuk and Gabriele Kotsis