Replication Techniques in Distributed Systems


Wanlei Zhou


A. A. Helal, A. A. Heddaya, and B. B. Bhargrava
Kluwer Academic Publishers, Boston, 1996, 176 pp.
ISBN 0-7923-9800-9, $118.50

Computing in the 1990s has reached the state of distributed computing. A basis of this form of computing is a distributed computing system, which is built on the following three components: (a) personal computers, (b) local and fast wide area networks, and (c) system and application software.

By amalgamating computers and networks into one single computing system and providing an appropriate system software, a distributed computing system has created the possibility of sharing information and peripheral resources. Furthermore, these systems improved performance of a computing system and individual users through parallel execution of programs, load balancing and sharing, and replication of programs and data. Distributed computing systems are also characterized by enhanced availability, and increased reliability.

Replication is the key to providing high availability, fault tolerance, and enhanced performance in a distributed computing system. As companies move toward systems that are more open and distributed, replication is becoming increasingly important in the ability to provide data and services that are current, correct and available, which is a key factor in maintaining a competitive advantage over rivals.

However, replication has also generated some serious challenges. For example, when a replica is updated, how do we propagate such an update to other replicas? If multiple replicas are updated simultaneously and some of the updates conflict one another, how do we resolve the conflict? How do we deal with the situation where a replica is down and then subsequently recovered? How do we deal with network partitions? Not surprisingly, considerable research efforts have been directed towards the solution of these challenges. However, most of the research results are scattered among many journals, conference proceedings, theses, and technical reports. The book by Helal, Heddaya and Bhargava has gathered the best material of these research results and formed a coherent collection that includes definitions, theoretical background, algorithms, annotated bibliographies of commercial and experimental prototype systems, and more than 200 references.

This book classifies the entities that can be replicated on a distributed computing system into the following categories: (a) data, (b) process, (c) objects, and (d) messages. After the introduction on the goals and main approaches of replication techniques, the book uses individual chapters to describe major methods for each of the above replication categories. The main focus of this part, however, is on the replication of data. The book also contains two chapters on the replication issues in heterogeneous, mobile, and large-scale systems, and on the future of replication techniques.

Apart from the normal chapters, the book contains a rich set of appendices that are very useful for further studies of replication techniques. These appendices include brief descriptions of two dozen commercial and experimental systems that use replication techniques, annotated bibliographies of selected literature on various topics of replication, and an introduction on serializability theory. Each annotated bibliography contains an overview of the specific topic and is written by an invited expert in that field. The combination of the normal chapters and appendices cover the entire spectrum of replication, from definitions, concepts, theories, algorithms, techniques, to systems.

The book serves as an excellent introduction and roadmap to the issues of replication. Although it does not provide a detailed description of every replication techniques and systems, the book covers most of the fundamental work that allows the reader to understand the roots of these techniques and systems, and to find the ways to search for the details. This book is a valuable reference book for anyone studying replication techniques or implementing replication systems, specifically, it can be very useful to post-graduate students, practitioners, and beginning researchers of this exciting field.

Wanlei Zhou,
School of Computing and Mathematics,
Deakin University


Book review