Parallel Database Techniques
Main Article Content
Abstract
Mahdi Abdelguerfi and Kam-Fai Wong (editors)
IEEE Computer Society, Los Alamitos, CA, 1998, 232 pages
ISBN 0-8186-8398-8, $30.00 Members / $40.00 List
This book is a collection of specialized reports on parallel data base techniques. It is written by a group of experts in modern use and design of parallel database systems and architectures. The purpose of the book is to inform the designers and users of large databases systems on the existing capabilities of software and hardware to accommodate parallelization of database management systems (DBMS) and the authors have the merit of communicating their results in a highly technical and very straightforward manner.
After a brief introduction the book is structured in three main parts: Request manager, Parallel machine architecture and Partitioned Data Store.
The parallelism of databases naturally arises from their underlying data model and in particular relational data models (RDM) are well subjected to parallelization. Parallelism in RDM is mainly acquired via the independence between two tables and also between two tuples within a table. The request manager part of the book comprises 4 chapters. The parallel query optimization techniques with an algorithm for the XPRS shared memory parallel database system are presented, followed naturally by a novel approach to parallel join, namely the page connectivity information. Performance evaluation tools for parallel database systems are next. An example of the Software Testpilot on the performance assessment of Oracle V7.x on the Ncube with 64 nodes supports the analysis of performance evaluation.
In the area of data management, two issues are addressed: load lacement and recovery. In parallel techniques, the load placement is a key issue and the analysis of a few load placement schemes is included. Recovery in a client-server database system is a very complex process and its success depends mainly on avoiding inconsistent data base state. The authors introduce a framework for recovery analysis based on ACTA formalism and compare the recovery requirements for three client-server systems: ESM CS, ARIES/CS and a Shared Nothing with Disks (CD).
Three chapters are related to parallel machine architecture. Firstly, a chapter on parallel strategies for a petabyte multimedia database computer provides an analysis of the new concepts and technical challenges addressed by the new database paradigm. A Multimedia Data Warehouse concept is introduced with the application from a National Institute for Standard and Technology (NIST) medical knowledge bank program that uses the concepts and multimedia object/ relational database systems described in this chapter. The analysis of a petabyte system is based on the analysis of Teradata Multimedia Database System (Teradata MM), the Teradata Multimedia Object Server architecture and the Teradata Relational Database System with the use of parallel infrastructure and computer platforms like DBC/1012 and WorldMark 5100M. The work on a prototype self-organizing multiprocessor database machine, MEDUSA is presented next. MEDUSA is a parallel data server based on a shared nothing (SN) architecture and performance testing shows that MEDUSA is suited well as a research prototype or and a backend server to the currently available conventional processors. MEDUSA is aimed as an economic data server by the use of off the shelf INMOS Transputer components. The next chapter introduces the system software of the Super Database Computer SDC-II, which is a highly parallel database server aimed to provide processing of large scale and complex queries. SCD-II is realistically compared with commercial products and despite its limitations due to disk capacity; the benchmarks tests show that SDC-II is based on an efficient and promising approach. The last chapter is an analysis of Data Placement in parallel database systems. The layout of the data across the processors can have a significant impact on the performance of a parallel DBMS. Five different data placement strategies are considered. The authors present a study realized with STEADY (System Throughput Estimator for Advanced Database Systems). STEADY is an analytical tool for performance estimation of SN parallel DBMS's. The authors focus mainly on the study of data placement strategies at the processor level.
Database technology is now expanding and the need to handle very large heterogeneous databases make parallelization the only valid approach for the next generation of DBMS's. The high potential of the parallel databases and the rapidly increasing sizes of databases require that both the vendors and users have a deep understanding of parallel database systems. The book is intended for the well-informed specialists in DBMS and for computer engineers who design parallel architectures able to support fast and frequent database transactions.
The background and training required to read this book is at the level of graduate studies with practical experience and mostly knowledge of the commercial products, both on database management and in parallel architecture and networking. The reader finds a broad area of subjects discussed in the book in the area of parallel databases and it is a must-read for implementers and designers of modern large databases. Although a great reference, the book is not intended as a text for a course.
For the less informed reader this book is not what is usually understood by a self-contained book mainly because of its level of technicality and the lack of low-level introductions on each subject presented. The results communicated are new and of good applicability in the near future approaches in the area of parallel database transactions and architecture design.
Michelle Pal, Los Alamos National Laboratory
Los Alamos, New Mexico