High availability and disaster recovery : concepts, design, implementation

Select an Action

Place Hold(s)
Add to My Lists
Email
Print

Title:

Personal Author:

Schmidt, Klaus

Publication Information:

Berlin : Springer-Verlag, 2006

ISBN:

9783540244608

Subject Term:

Data recovery (Computer science)

Electronic data processing -- Backup processing alternatives

Redundancy (Engineering)

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010134249	QA76.9.D348 S35 2006	Open Access Book	Book	Searching... Unknown
Searching... Razak School	30000010160733	QA76.9.D348 S35 2006	Open Access Book	Book	Searching... Unknown

Companies and other organizations depend more than ever on the availability of their Information Technology, and most mission critical business processes are IT-based processes. Business continuity is the ability to do business under any circumstances and is an essential requirement modern companies are facing. High availability and disaster recovery are contributions of the IT to fulfill this requirement. And companies will be confronted with such demands to an even greater extent in the future, since their credit ratings will be lower without such precautions.

Both, high availability and disaster recovery, are realized by redundant systems. Redundancy can and should be implemented on different abstraction levels: from the hardware, the operating system and middleware components up to the backup computing center in case of a disaster. This book presents requirements, concepts, and realizations of redundant systems on all abstraction levels, and all given examples refer to UNIX and Linux systems.

1 Introduction	p. 1
1.1 Audience	p. 2
1.2 Roadmap of This Book	p. 4
1.3 Real-World Examples	p. 8
2 Elementary Concepts	p. 13
2.1 Business Issues	p. 14
2.1.1 Business Continuity as the Overall Goal	p. 16
2.1.2 Regulatory Compliance and Risk Management	p. 16
2.2 System and Outage Categorization	p. 17
2.3 High Availability - Handling Minor Outages	p. 22
2.3.1 Availability	p. 24
2.3.2 Reliability	p. 25
2.3.3 Serviceability	p. 25
2.4 Disaster Recovery - Handling Major Outages	p. 26
2.5 Quantifying Availability: 99.9... % and Reality	p. 29
2.6 Service Level Agreements	p. 31
2.7 Basic Approach: Robustness and Redundancy	p. 34
2.8 Layered Solution with Multiple Precautions	p. 38
2.9 Summary	p. 39
3 Architecture	p. 41
3.1 Objectives	p. 45
3.2 Conceptual Model	p. 48
3.3 System Model	p. 51
4 System Design	p. 55
4.1 Base Concepts	p. 55
4.1.1 System Stack	p. 56
4.1.2 Redundancy and Replication	p. 61
4.1.3 Robustness and Simplicity	p. 74
4.1.4 Virtualization	p. 77
4.2 Solution Roadmap	p. 78
4.2.1 List Failure Scenarios	p. 79
4.2.2 Evaluate Failure Scenarios	p. 82
4.2.3 Map Scenarios to Requirements	p. 82
4.2.4 Design Solution	p. 85
4.2.5 Review Selected Solution Against Scenarios	p. 86
4.3 System Solution Patterns	p. 86
4.3.1 System Implementation Process	p. 87
4.3.2 Systems for All Process Steps	p. 87
4.3.3 Use Case: SAP Server	p. 89
5 Hardware	p. 99
5.1 Components and Computer Systems	p. 104
5.2 Disk Storage	p. 108
5.2.1 Raid - Redundant Array of Independent Disks	p. 109
5.2.2 Storage Systems	p. 119
5.2.3 SAN vs. NAS	p. 124
5.2.4 Journaling Is Essential for High Availability	p. 125
5.3 Virtualization of Resources	p. 126
5.4 Vendor Selection and Purchasing Decisions	p. 128
5.5 System Installation	p. 132
5.6 System Maintenance and Operations	p. 139
5.7 Making Our Own Statistics	p. 142
6 Operating Systems	p. 149
6.1 Failover Clusters	p. 151
6.1.1 How Does It Work?	p. 157
6.1.2 Failover Cluster Implementation Experiences	p. 166
6.2 Load-Balancing Clusters	p. 176
6.2.1 Load-Balancing Approaches	p. 178
6.2.2 Target Selection for Load Balancing	p. 181
6.3 Cluster and Server Consolidation	p. 183
6.3.1 Virtualization and Moore's Law	p. 183
6.3.2 Host Virtualization	p. 184
7 Databases and Middleware	p. 189
7.1 Middleware Categories	p. 191
7.2 Database Servers	p. 193
7.2.1 High-Availability Options for Database Servers	p. 199
7.2.2 Disaster Recovery for Databases	p. 204
7.3 Web Servers	p. 205
7.4 Application Servers	p. 208
7.5 Messaging Servers	p. 213
8 Applications	p. 215
8.1 Integration in a Cluster on the Operating System Level	p. 217
8.2 High Availability Through Middleware	p. 223
8.3 High Availability From Scratch	p. 225
8.4 Code Quality Is Important	p. 227
8.5 Testing for High Availability	p. 229
9 Infrastructure	p. 233
9.1 Network	p. 234
9.1.1 Network Devices	p. 238
9.1.2 LAN Segments	p. 240
9.1.3 Default Gateway	p. 248
9.1.4 Routing in LANs and WANs	p. 252
9.1.5 Firewalls and Network Address Translation	p. 258
9.1.6 Network Design for Disaster Recovery	p. 264
9.2 Infrastructure Services	p. 267
9.2.1 Dynamic Host Configuration Protocol (DHCP)	p. 267
9.2.2 Domain Name Service (DNS)	p. 271
9.2.3 Directory Server	p. 276
9.3 Backup and Restore	p. 283
9.4 Monitoring	p. 284
10 Disaster Recovery	p. 287
10.1 Concepts	p. 289
10.2 Approach	p. 291
10.3 Conceptual Design	p. 292
10.3.1 Scenarios for Major Outages	p. 293
10.3.2 Disaster-Recovery Scope	p. 295
10.3.3 Primary and Disaster-Recovery Sites	p. 297
10.3.4 State Synchronization	p. 298
10.3.5 Shared System, Hot or Cold Standby	p. 300
10.3.6 Time to Recovery - Failback to the Primary Site	p. 303
10.4 Solutions	p. 305
10.4.1 Metro Cluster	p. 306
10.4.2 Fast Restore	p. 309
10.4.3 Application-Level or Middleware-Level Clustering	p. 309
10.4.4 Application Data Mirroring	p. 310
10.4.5 Disk Mirroring	p. 317
10.4.6 Matching Configuration Changes	p. 317
10.5 Disaster-Recovery Tests	p. 318
10.5.1 Test Goals and Categories	p. 319
10.5.2 Organizational Test Context	p. 321
10.5.3 Quality Characteristics	p. 322
10.6 Holistic View - What Is Needed Besides Technology?	p. 322
10.6.1 Command Center and War Room	p. 323
10.6.2 Disaster-Recovery Emergency Pack	p. 323
10.7 A Prototypical Disaster-Recovery Project	p. 324
10.7.1 System Identification - the Primary Site	p. 326
10.7.2 Business Requirements and Project Goals	p. 331
10.7.3 Business View	p. 333
10.7.4 System Design	p. 336
10.7.5 Implementation	p. 345
10.8 Failover to Disaster-Recovery Site or Disaster-Recovery Systems	p. 351
10.8.1 General Approach	p. 351
10.8.2 Example Checklist for a Database Disaster-Recovery Server	p. 355
10.8.3 Failback to the Primary System	p. 357
A Reliability Calculations and Statistics	p. 359
A.1 Mathematical Basics	p. 360
A.2 Mean Time Between Failures and Annual Failure Rate	p. 362
A.3 Redundancy and Probability of Failures	p. 363
A.4 Raid Configurations	p. 365
A.5 Example Calculations	p. 372
A.6 Reliability over Time - the Bathtub Curve	p. 374
B Data Centers	p. 377
B.1 Room Installation	p. 378
B.2 Heat and Fire Control	p. 381
B.3 Power Control	p. 384
B.4 Computer Setup	p. 386
C Service Support Processes	p. 387
C.1 Incident Management	p. 388
C.2 Problem Management	p. 389
C.3 Configuration Management	p. 391
C.4 Change Management	p. 394
C.5 Release Management	p. 395
C.6 Information Gathering and Reporting	p. 397
References	p. 399
Index	p. 401

Available:*

On Order

Summary

Summary

Table of Contents