Cover image for High availability and disaster recovery : concepts, design, implementation
Title:
High availability and disaster recovery : concepts, design, implementation
Personal Author:
Publication Information:
Berlin : Springer-Verlag, 2006
ISBN:
9783540244608

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010134249 QA76.9.D348 S35 2006 Open Access Book Book
Searching...
Searching...
30000010160733 QA76.9.D348 S35 2006 Open Access Book Book
Searching...

On Order

Summary

Summary

Companies and other organizations depend more than ever on the availability of their Information Technology, and most mission critical business processes are IT-based processes. Business continuity is the ability to do business under any circumstances and is an essential requirement modern companies are facing. High availability and disaster recovery are contributions of the IT to fulfill this requirement. And companies will be confronted with such demands to an even greater extent in the future, since their credit ratings will be lower without such precautions.

Both, high availability and disaster recovery, are realized by redundant systems. Redundancy can and should be implemented on different abstraction levels: from the hardware, the operating system and middleware components up to the backup computing center in case of a disaster. This book presents requirements, concepts, and realizations of redundant systems on all abstraction levels, and all given examples refer to UNIX and Linux systems.


Table of Contents

1 Introductionp. 1
1.1 Audiencep. 2
1.2 Roadmap of This Bookp. 4
1.3 Real-World Examplesp. 8
2 Elementary Conceptsp. 13
2.1 Business Issuesp. 14
2.1.1 Business Continuity as the Overall Goalp. 16
2.1.2 Regulatory Compliance and Risk Managementp. 16
2.2 System and Outage Categorizationp. 17
2.3 High Availability - Handling Minor Outagesp. 22
2.3.1 Availabilityp. 24
2.3.2 Reliabilityp. 25
2.3.3 Serviceabilityp. 25
2.4 Disaster Recovery - Handling Major Outagesp. 26
2.5 Quantifying Availability: 99.9... % and Realityp. 29
2.6 Service Level Agreementsp. 31
2.7 Basic Approach: Robustness and Redundancyp. 34
2.8 Layered Solution with Multiple Precautionsp. 38
2.9 Summaryp. 39
3 Architecturep. 41
3.1 Objectivesp. 45
3.2 Conceptual Modelp. 48
3.3 System Modelp. 51
4 System Designp. 55
4.1 Base Conceptsp. 55
4.1.1 System Stackp. 56
4.1.2 Redundancy and Replicationp. 61
4.1.3 Robustness and Simplicityp. 74
4.1.4 Virtualizationp. 77
4.2 Solution Roadmapp. 78
4.2.1 List Failure Scenariosp. 79
4.2.2 Evaluate Failure Scenariosp. 82
4.2.3 Map Scenarios to Requirementsp. 82
4.2.4 Design Solutionp. 85
4.2.5 Review Selected Solution Against Scenariosp. 86
4.3 System Solution Patternsp. 86
4.3.1 System Implementation Processp. 87
4.3.2 Systems for All Process Stepsp. 87
4.3.3 Use Case: SAP Serverp. 89
5 Hardwarep. 99
5.1 Components and Computer Systemsp. 104
5.2 Disk Storagep. 108
5.2.1 Raid - Redundant Array of Independent Disksp. 109
5.2.2 Storage Systemsp. 119
5.2.3 SAN vs. NASp. 124
5.2.4 Journaling Is Essential for High Availabilityp. 125
5.3 Virtualization of Resourcesp. 126
5.4 Vendor Selection and Purchasing Decisionsp. 128
5.5 System Installationp. 132
5.6 System Maintenance and Operationsp. 139
5.7 Making Our Own Statisticsp. 142
6 Operating Systemsp. 149
6.1 Failover Clustersp. 151
6.1.1 How Does It Work?p. 157
6.1.2 Failover Cluster Implementation Experiencesp. 166
6.2 Load-Balancing Clustersp. 176
6.2.1 Load-Balancing Approachesp. 178
6.2.2 Target Selection for Load Balancingp. 181
6.3 Cluster and Server Consolidationp. 183
6.3.1 Virtualization and Moore's Lawp. 183
6.3.2 Host Virtualizationp. 184
7 Databases and Middlewarep. 189
7.1 Middleware Categoriesp. 191
7.2 Database Serversp. 193
7.2.1 High-Availability Options for Database Serversp. 199
7.2.2 Disaster Recovery for Databasesp. 204
7.3 Web Serversp. 205
7.4 Application Serversp. 208
7.5 Messaging Serversp. 213
8 Applicationsp. 215
8.1 Integration in a Cluster on the Operating System Levelp. 217
8.2 High Availability Through Middlewarep. 223
8.3 High Availability From Scratchp. 225
8.4 Code Quality Is Importantp. 227
8.5 Testing for High Availabilityp. 229
9 Infrastructurep. 233
9.1 Networkp. 234
9.1.1 Network Devicesp. 238
9.1.2 LAN Segmentsp. 240
9.1.3 Default Gatewayp. 248
9.1.4 Routing in LANs and WANsp. 252
9.1.5 Firewalls and Network Address Translationp. 258
9.1.6 Network Design for Disaster Recoveryp. 264
9.2 Infrastructure Servicesp. 267
9.2.1 Dynamic Host Configuration Protocol (DHCP)p. 267
9.2.2 Domain Name Service (DNS)p. 271
9.2.3 Directory Serverp. 276
9.3 Backup and Restorep. 283
9.4 Monitoringp. 284
10 Disaster Recoveryp. 287
10.1 Conceptsp. 289
10.2 Approachp. 291
10.3 Conceptual Designp. 292
10.3.1 Scenarios for Major Outagesp. 293
10.3.2 Disaster-Recovery Scopep. 295
10.3.3 Primary and Disaster-Recovery Sitesp. 297
10.3.4 State Synchronizationp. 298
10.3.5 Shared System, Hot or Cold Standbyp. 300
10.3.6 Time to Recovery - Failback to the Primary Sitep. 303
10.4 Solutionsp. 305
10.4.1 Metro Clusterp. 306
10.4.2 Fast Restorep. 309
10.4.3 Application-Level or Middleware-Level Clusteringp. 309
10.4.4 Application Data Mirroringp. 310
10.4.5 Disk Mirroringp. 317
10.4.6 Matching Configuration Changesp. 317
10.5 Disaster-Recovery Testsp. 318
10.5.1 Test Goals and Categoriesp. 319
10.5.2 Organizational Test Contextp. 321
10.5.3 Quality Characteristicsp. 322
10.6 Holistic View - What Is Needed Besides Technology?p. 322
10.6.1 Command Center and War Roomp. 323
10.6.2 Disaster-Recovery Emergency Packp. 323
10.7 A Prototypical Disaster-Recovery Projectp. 324
10.7.1 System Identification - the Primary Sitep. 326
10.7.2 Business Requirements and Project Goalsp. 331
10.7.3 Business Viewp. 333
10.7.4 System Designp. 336
10.7.5 Implementationp. 345
10.8 Failover to Disaster-Recovery Site or Disaster-Recovery Systemsp. 351
10.8.1 General Approachp. 351
10.8.2 Example Checklist for a Database Disaster-Recovery Serverp. 355
10.8.3 Failback to the Primary Systemp. 357
A Reliability Calculations and Statisticsp. 359
A.1 Mathematical Basicsp. 360
A.2 Mean Time Between Failures and Annual Failure Ratep. 362
A.3 Redundancy and Probability of Failuresp. 363
A.4 Raid Configurationsp. 365
A.5 Example Calculationsp. 372
A.6 Reliability over Time - the Bathtub Curvep. 374
B Data Centersp. 377
B.1 Room Installationp. 378
B.2 Heat and Fire Controlp. 381
B.3 Power Controlp. 384
B.4 Computer Setupp. 386
C Service Support Processesp. 387
C.1 Incident Managementp. 388
C.2 Problem Managementp. 389
C.3 Configuration Managementp. 391
C.4 Change Managementp. 394
C.5 Release Managementp. 395
C.6 Information Gathering and Reportingp. 397
Referencesp. 399
Indexp. 401