Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010134249 | QA76.9.D348 S35 2006 | Open Access Book | Book | Searching... |
Searching... | 30000010160733 | QA76.9.D348 S35 2006 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Companies and other organizations depend more than ever on the availability of their Information Technology, and most mission critical business processes are IT-based processes. Business continuity is the ability to do business under any circumstances and is an essential requirement modern companies are facing. High availability and disaster recovery are contributions of the IT to fulfill this requirement. And companies will be confronted with such demands to an even greater extent in the future, since their credit ratings will be lower without such precautions.
Both, high availability and disaster recovery, are realized by redundant systems. Redundancy can and should be implemented on different abstraction levels: from the hardware, the operating system and middleware components up to the backup computing center in case of a disaster. This book presents requirements, concepts, and realizations of redundant systems on all abstraction levels, and all given examples refer to UNIX and Linux systems.
Table of Contents
1 Introduction | p. 1 |
1.1 Audience | p. 2 |
1.2 Roadmap of This Book | p. 4 |
1.3 Real-World Examples | p. 8 |
2 Elementary Concepts | p. 13 |
2.1 Business Issues | p. 14 |
2.1.1 Business Continuity as the Overall Goal | p. 16 |
2.1.2 Regulatory Compliance and Risk Management | p. 16 |
2.2 System and Outage Categorization | p. 17 |
2.3 High Availability - Handling Minor Outages | p. 22 |
2.3.1 Availability | p. 24 |
2.3.2 Reliability | p. 25 |
2.3.3 Serviceability | p. 25 |
2.4 Disaster Recovery - Handling Major Outages | p. 26 |
2.5 Quantifying Availability: 99.9... % and Reality | p. 29 |
2.6 Service Level Agreements | p. 31 |
2.7 Basic Approach: Robustness and Redundancy | p. 34 |
2.8 Layered Solution with Multiple Precautions | p. 38 |
2.9 Summary | p. 39 |
3 Architecture | p. 41 |
3.1 Objectives | p. 45 |
3.2 Conceptual Model | p. 48 |
3.3 System Model | p. 51 |
4 System Design | p. 55 |
4.1 Base Concepts | p. 55 |
4.1.1 System Stack | p. 56 |
4.1.2 Redundancy and Replication | p. 61 |
4.1.3 Robustness and Simplicity | p. 74 |
4.1.4 Virtualization | p. 77 |
4.2 Solution Roadmap | p. 78 |
4.2.1 List Failure Scenarios | p. 79 |
4.2.2 Evaluate Failure Scenarios | p. 82 |
4.2.3 Map Scenarios to Requirements | p. 82 |
4.2.4 Design Solution | p. 85 |
4.2.5 Review Selected Solution Against Scenarios | p. 86 |
4.3 System Solution Patterns | p. 86 |
4.3.1 System Implementation Process | p. 87 |
4.3.2 Systems for All Process Steps | p. 87 |
4.3.3 Use Case: SAP Server | p. 89 |
5 Hardware | p. 99 |
5.1 Components and Computer Systems | p. 104 |
5.2 Disk Storage | p. 108 |
5.2.1 Raid - Redundant Array of Independent Disks | p. 109 |
5.2.2 Storage Systems | p. 119 |
5.2.3 SAN vs. NAS | p. 124 |
5.2.4 Journaling Is Essential for High Availability | p. 125 |
5.3 Virtualization of Resources | p. 126 |
5.4 Vendor Selection and Purchasing Decisions | p. 128 |
5.5 System Installation | p. 132 |
5.6 System Maintenance and Operations | p. 139 |
5.7 Making Our Own Statistics | p. 142 |
6 Operating Systems | p. 149 |
6.1 Failover Clusters | p. 151 |
6.1.1 How Does It Work? | p. 157 |
6.1.2 Failover Cluster Implementation Experiences | p. 166 |
6.2 Load-Balancing Clusters | p. 176 |
6.2.1 Load-Balancing Approaches | p. 178 |
6.2.2 Target Selection for Load Balancing | p. 181 |
6.3 Cluster and Server Consolidation | p. 183 |
6.3.1 Virtualization and Moore's Law | p. 183 |
6.3.2 Host Virtualization | p. 184 |
7 Databases and Middleware | p. 189 |
7.1 Middleware Categories | p. 191 |
7.2 Database Servers | p. 193 |
7.2.1 High-Availability Options for Database Servers | p. 199 |
7.2.2 Disaster Recovery for Databases | p. 204 |
7.3 Web Servers | p. 205 |
7.4 Application Servers | p. 208 |
7.5 Messaging Servers | p. 213 |
8 Applications | p. 215 |
8.1 Integration in a Cluster on the Operating System Level | p. 217 |
8.2 High Availability Through Middleware | p. 223 |
8.3 High Availability From Scratch | p. 225 |
8.4 Code Quality Is Important | p. 227 |
8.5 Testing for High Availability | p. 229 |
9 Infrastructure | p. 233 |
9.1 Network | p. 234 |
9.1.1 Network Devices | p. 238 |
9.1.2 LAN Segments | p. 240 |
9.1.3 Default Gateway | p. 248 |
9.1.4 Routing in LANs and WANs | p. 252 |
9.1.5 Firewalls and Network Address Translation | p. 258 |
9.1.6 Network Design for Disaster Recovery | p. 264 |
9.2 Infrastructure Services | p. 267 |
9.2.1 Dynamic Host Configuration Protocol (DHCP) | p. 267 |
9.2.2 Domain Name Service (DNS) | p. 271 |
9.2.3 Directory Server | p. 276 |
9.3 Backup and Restore | p. 283 |
9.4 Monitoring | p. 284 |
10 Disaster Recovery | p. 287 |
10.1 Concepts | p. 289 |
10.2 Approach | p. 291 |
10.3 Conceptual Design | p. 292 |
10.3.1 Scenarios for Major Outages | p. 293 |
10.3.2 Disaster-Recovery Scope | p. 295 |
10.3.3 Primary and Disaster-Recovery Sites | p. 297 |
10.3.4 State Synchronization | p. 298 |
10.3.5 Shared System, Hot or Cold Standby | p. 300 |
10.3.6 Time to Recovery - Failback to the Primary Site | p. 303 |
10.4 Solutions | p. 305 |
10.4.1 Metro Cluster | p. 306 |
10.4.2 Fast Restore | p. 309 |
10.4.3 Application-Level or Middleware-Level Clustering | p. 309 |
10.4.4 Application Data Mirroring | p. 310 |
10.4.5 Disk Mirroring | p. 317 |
10.4.6 Matching Configuration Changes | p. 317 |
10.5 Disaster-Recovery Tests | p. 318 |
10.5.1 Test Goals and Categories | p. 319 |
10.5.2 Organizational Test Context | p. 321 |
10.5.3 Quality Characteristics | p. 322 |
10.6 Holistic View - What Is Needed Besides Technology? | p. 322 |
10.6.1 Command Center and War Room | p. 323 |
10.6.2 Disaster-Recovery Emergency Pack | p. 323 |
10.7 A Prototypical Disaster-Recovery Project | p. 324 |
10.7.1 System Identification - the Primary Site | p. 326 |
10.7.2 Business Requirements and Project Goals | p. 331 |
10.7.3 Business View | p. 333 |
10.7.4 System Design | p. 336 |
10.7.5 Implementation | p. 345 |
10.8 Failover to Disaster-Recovery Site or Disaster-Recovery Systems | p. 351 |
10.8.1 General Approach | p. 351 |
10.8.2 Example Checklist for a Database Disaster-Recovery Server | p. 355 |
10.8.3 Failback to the Primary System | p. 357 |
A Reliability Calculations and Statistics | p. 359 |
A.1 Mathematical Basics | p. 360 |
A.2 Mean Time Between Failures and Annual Failure Rate | p. 362 |
A.3 Redundancy and Probability of Failures | p. 363 |
A.4 Raid Configurations | p. 365 |
A.5 Example Calculations | p. 372 |
A.6 Reliability over Time - the Bathtub Curve | p. 374 |
B Data Centers | p. 377 |
B.1 Room Installation | p. 378 |
B.2 Heat and Fire Control | p. 381 |
B.3 Power Control | p. 384 |
B.4 Computer Setup | p. 386 |
C Service Support Processes | p. 387 |
C.1 Incident Management | p. 388 |
C.2 Problem Management | p. 389 |
C.3 Configuration Management | p. 391 |
C.4 Change Management | p. 394 |
C.5 Release Management | p. 395 |
C.6 Information Gathering and Reporting | p. 397 |
References | p. 399 |
Index | p. 401 |