Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010327990 | QA76.585 B394 2012 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
A holistic approach to service reliability and availability of cloud computing
Reliability and Availability of Cloud Computing provides IS/IT system and solution architects, developers, and engineers with the knowledge needed to assess the impact of virtualization and cloud computing on service reliability and availability. It reveals how to select the most appropriate design for reliability diligence to assure that user expectations are met.
Organized in three parts (basics, risk analysis, and recommendations), this resource is accessible to readers of diverse backgrounds and experience levels. Numerous examples and more than 100 figures throughout the book help readers visualize problems to better understand the topic--and the authors present risks and options in bulleted lists that can be applied directly to specific applications/problems.
Special features of this book include:
Rigorous analysis of the reliability and availability risks that are inherent in cloud computing Simple formulas that explain the quantitative aspects of reliability and availability Enlightening discussions of the ways in which virtualized applications and cloud deployments differ from traditional system implementations and deployments Specific recommendations for developing reliable virtualized applications and cloud-based solutionsReliability and Availability of Cloud Computing is the guide for IS/IT staff in business, government, academia, and non-governmental organizations who are moving their applications to the cloud. It is also an important reference for professionals in technical sales, product management, and quality management, as well as software and quality engineers looking to broaden their expertise.
Author Notes
ERIC BAUER is a reliability engineering manager in the Software, Solutions and Services Group of Alcatel-Lucent. The holder of more than a dozen U.S. patents, he is the author of Design for Reliability: Information and Computer-Based Systems, Beyond Redundancy: How Geographic Redundancy Can Improve Service Availability and Reliability of Computer-Based Systems, and Practical System Reliability, also available from Wiley-IEEE Press.
RANDEE ADAMS is a consulting member of technical staff in the Software, Solutions and Services Group of Alcatel-Lucent and the coauthor of Beyond Redundancy: How Geographic Redundancy Can Improve Service Availability and Reliability of Computer-Based Systems .
Table of Contents
Figures | p. xvii |
Tables | p. xxi |
Equations | p. xxiii |
Introduction | p. xxv |
I Basics | p. 1 |
1 Cloud Computing | p. 3 |
1.1 Essential Cloud Characteristics | p. 4 |
1.2 Common Cloud Characteristics | p. 6 |
1.3 But What, Exactly, Is Cloud Computing? | p. 7 |
1.4 Service Models | p. 9 |
1.5 Cloud Deployment Models | p. 11 |
1.6 Roles in Cloud Computing | p. 12 |
1.7 Benefi ts of Cloud Computing | p. 14 |
1.8 Risks of Cloud Computing | p. 15 |
2 Virtualization | p. 16 |
2.1 Background | p. 16 |
2.2 What Is Virtualization? | p. 17 |
2.3 Server Virtualization | p. 19 |
2.4 VM Lifecycle | p. 23 |
2.5 Reliability and Availability Risks of Virtualization | p. 28 |
3 Service Reliability and Service Availability | p. 29 |
3.1 Errors and Failures | p. 30 |
3.2 Eight-Ingredient Framework | p. 31 |
3.3 Service Availability | p. 34 |
3.4 Service Reliability | p. 43 |
3.5 Service Latency | p. 46 |
3.6 Redundancy and High Availability | p. 50 |
3.7 High Availability and Disaster Recovery | p. 56 |
3.8 Streaming Services | p. 58 |
3.9 Reliability and Availability Risks of Cloud Computing | p. 62 |
II Analysis | p. 63 |
4 Analyzing Cloud Reliability and Availability | p. 65 |
4.1 Expectations for Service Reliability and Availability | p. 65 |
4.2 Risks of Essential Cloud Characteristics | p. 66 |
4.3 Impacts of Common Cloud Characteristics | p. 70 |
4.4 Risks of Service Models | p. 72 |
4.5 IT Service Management and Availability Risks | p. 74 |
4.6 Outage Risks by Process Area | p. 80 |
4.7 Failure Detection Considerations | p. 83 |
4.8 Risks of Deployment Models | p. 87 |
4.9 Expectations of IaaS Data Centers | p. 87 |
5 Reliability Analysis of Virtualization | p. 90 |
5.1 Reliability Analysis Techniques | p. 90 |
5.2 Reliability Analysis of Virtualization Techniques | p. 95 |
5.3 Software Failure Rate Analysis | p. 100 |
5.4 Recovery Models | p. 101 |
5.5 Application Architecture Strategies | p. 108 |
5.6 Availability Modeling of Virtualized Recovery Options | p. 110 |
6 Hardware Reliability, Virtualization, and Service Availability | p. 116 |
6.1 Hardware Downtime Expectations | p. 116 |
6.2 Hardware Failures | p. 117 |
6.3 Hardware Failure Rate | p. 119 |
6.4 Hardware Failure Detection | p. 121 |
6.5 Hardware Failure Containment | p. 122 |
6.6 Hardware Failure Mitigation | p. 122 |
6.7 Mitigating Hardware Failures via Virtualization | p. 124 |
6.8 Virtualized Networks | p. 127 |
6.9 MTTR of Virtualized Hardware | p. 129 |
6.10 Discussion | p. 131 |
7 Capacity and Elasticity | p. 132 |
7.1 System Load Basics | p. 132 |
7.2 Overload, Service Reliability, and Service Availability | p. 135 |
7.3 Traditional Capacity Planning | p. 136 |
7.4 Cloud and Capacity | p. 137 |
7.5 Managing Online Capacity | p. 144 |
7.6 Capacity-Related Service Risks | p. 147 |
7.7 Capacity Management Risks | p. 153 |
7.8 Security and Service Availability | p. 157 |
7.9 Architecting for Elastic Growth and Degrowth | p. 162 |
8 Service Orchestration Analysis | p. 164 |
8.1 Service Orchestration Definition | p. 164 |
8.2 Policy-Based Management | p. 166 |
8.3 Cloud Management | p. 168 |
8.4 Service OrchestrationÆs Role in Risk Mitigation | p. 169 |
9 Geographic Distribution, Georedundancy, and Disaster Recovery | p. 174 |
9.1 Geographic Distribution versus Georedundancy | p. 175 |
9.2 Traditional Disaster Recovery | p. 175 |
9.3 Virtualization and Disaster Recovery | p. 177 |
9.4 Cloud Computing and Disaster Recovery | p. 178 |
9.5 Georedundancy Recovery Models | p. 180 |
9.6 Cloud and Traditional Collateral Benefits of Georedundancy | p. 180 |
9.7 Discussion | p. 182 |
III Recommendations | p. 183 |
10 Applications, Solutions, and Accountability | p. 185 |
10.1 Application Configuration Scenarios | p. 185 |
10.2 Application Deployment Scenario | p. 187 |
10.3 System Downtime Budgets | p. 188 |
10.4 End-to-End Solutions Considerations | p. 197 |
10.5 Attributability for Service Impairments | p. 201 |
10.6 Solution Service Measurement | p. 204 |
10.7 Managing Reliability and Service of Cloud Computing | p. 207 |
11 Recommendations for Architecting a Reliable System | p. 209 |
11.1 Architecting for Virtualization and Cloud | p. 209 |
11.2 Disaster Recovery | p. 216 |
11.3 IT Service Management Considerations | p. 217 |
11.4 Many Distributed Clouds versus Fewer Huge Clouds | p. 224 |
11.5 Minimizing Hardware-Attributed Downtime | p. 225 |
11.6 Architectural Optimizations | p. 231 |
12 Design for Reliability of Virtualized Applications | p. 244 |
12.1 Design for Reliability | p. 244 |
12.2 Tailoring DfR for Virtualized Applications | p. 246 |
12.3 Reliability Requirements | p. 248 |
12.4 Qualitative Reliability Analysis | p. 256 |
12.5 Quantitative Reliability Budgeting and Modeling | p. 259 |
12.6 Robustness Testing | p. 260 |
12.7 Stability Testing | p. 267 |
12.8 Field Performance Analysis | p. 268 |
12.9 Reliability Roadmap | p. 269 |
12.10 Hardware Reliability | p. 270 |
13 Design for Reliability of Cloud Solutions | p. 271 |
13.1 Solution Design for Reliability | p. 271 |
13.2 Solution Scope and Expectations | p. 273 |
13.3 Reliability Requirements | p. 275 |
13.4 Solution Modeling and Analysis | p. 279 |
13.5 Element Reliability Diligence | p. 285 |
13.6 Solution Testing and Validation | p. 285 |
13.7 Track and Analyze Field Performance | p. 288 |
13.8 Other Solution Reliability Diligence Topics | p. 292 |
14 Summary | p. 296 |
14.1 Service Reliability and Service Availability | p. 297 |
14.2 Failure Accountability and Cloud Computing | p. 299 |
14.3 Factoring Service Downtime | p. 301 |
14.4 Service Availability Measurement Points | p. 303 |
14.5 Cloud Capacity and Elasticity Considerations | p. 306 |
14.6 Maximizing Service Availability | p. 306 |
14.7 Reliability Diligence | p. 309 |
14.8 Concluding Remarks | p. 310 |
Abbreviations | p. 311 |
References | p. 314 |
About the authors | p. 318 |
Index | p. 319 |