Skip to:Content
|
Bottom
Cover image for Fault tolerance in distributed systems
Title:
Fault tolerance in distributed systems
Personal Author:
Publication Information:
Englewood Cliffs, N.J. : Prentice Hall, 1994
ISBN:
9780133013672

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000002955346 QA76.9.F38 J25 1994 Open Access Book Book
Searching...

On Order

Summary

Summary

Fault tolerance is an approach by which reliability of a computer system can be increased beyond what can be achieved by traditional methods. While hardware supported fault tolerance has been well-documented, the newer, software supported fault tolerance techniques have remained scattered throughout the literature. Comprehensive and self-contained, this book organizes that body of knowledge with a focus on fault tolerance in distributed systems. (The uniprocess case is treated as a special case of distributed systems.) KEY TOPICS: Treats fault tolerant distributed systems as consisting of levels of abstraction, providing different tolerant services. For researchers/practitioners working in the area of fault tolerance.


Table of Contents

1 Introduction
Basic Concepts and Definitions
Phases in Fault Tolerance
Overview of Hardware Fault Tolerance
Reliability and Availability
Summary
2 Distributed Systems
System Model
Interprocess Communication
Ordering of Events and Logical Clocks
Execution Model and System State
Summary
3 Basic Building Blocks
Byzantine Agreement
Synchronized Clocks
Stable Storage
Fail Stop Processors
Failure Detection and Fault Diagnosis
Reliable Message Delivery
Summary
4 Reliable, Atomic, and Causal Broadcast
Reliable Broadcast
Atomic Broadcast
Causal Broadcast
5 Recovering A Consistent State
Asynchronous Checkpointing and Rollback
Distributed Checkpointing
Summary
6 Atomic Actions
Atomic Actions and Serializability
Atomic Actions in a Centralized System
Commit Protocols
Atomic Actions on Decentralized Data
Summary
7 Data Replication And Resiliency
Optimistic Approaches
Primary Site Approach
Resiliency with Active Replicas
Voting
Degree of Replication
Summary
8 Process Resiliency
Resilient Remote Procedure Call
Resiliency with Asynchronous Communication
Resiliency with Synchronous Message Passing
Total Failure and Last Process to Fail
Summary
9 Software Design Faults
Approaches for Uniprocess Software
Backward Recovery in Concurrent Systems
Forward Recovery in Concurrent Systems
Summary
Bibliography
Go to:Top of Page