Cover image for Harness the power of Big Data : the IBM Big Data platform
Title:
Harness the power of Big Data : the IBM Big Data platform
Publication Information:
New York ; Singapore : McGraw-Hill, c2013
Physical Description:
xxx, 248 p. : ill. ; 23 cm
ISBN:
9780071808170
Abstract:
"Learn all about IBM's enterprise-class end-to-end Big Data platform. Boost your Big Data IQ : details on in-motion and at-rest analytics, data asset discovery, integration, and governance. Get details surrounding the most common Big Data use cases that are transforming organizations today. Learn how to make down-stream Big Data projects faster to deploy ... and less risky. Gain confidence in your Big Data projects with an end-to-end tour of accelerators, tool sets, and samples to get you going ... FAST!"-- cover
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010301271 QA76.9.D5 H37 2013 Open Access Book Book
Searching...

On Order

Summary

Summary

Publisher's Note: Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product.


Boost your Big Data IQ! Gain insight into how to govern and consume IBM's unique in-motion and at-rest Big Data analytic capabilities

Big Data represents a new era of computing--an inflection point of opportunity where data in any format may be explored and utilized for breakthrough insights--whether that data is in-place, in-motion, or at-rest. IBM is uniquely positioned to help clients navigate this transformation. This book reveals how IBM is infusing open source Big Data technologies with IBM innovation that manifest in a platform capable of "changing the game."

The four defining characteristics of Big Data--volume, variety, velocity, and veracity--are discussed. You'll understand how IBM is fully committed to Hadoop and integrating it into the enterprise. Hear about how organizations are taking inventories of their existing Big Data assets, with search capabilities that help organizations discover what they could already know, and extend their reach into new data territories for unprecedented model accuracy and discovery.

In this book you will also learn not just about the technologies that make up the IBM Big Data platform, but when to leverage its purpose-built engines for analytics on data in-motion and data at-rest. And you'll gain an understanding of how and when to govern Big Data, and how IBM's industry-leading InfoSphere integration and governance portfolio helps you understand, govern, and effectively utilize Big Data. Industry use cases are also included in this practical guide.


Author Notes

Paul C. Zikopoulos, B.A., M.B.A., is the Director of Technical Professionals for IBM Software Group's Information Management division and additionally leads the World-Wide Competitive Database and Big Data Technical Sales Acceleration teams.

Dirk deRoos, B.Sc., B.A., is IBM's World-Wide Technical Sales Leader for IBM InfoSphere BigInsights. He spent the past two years helping customers with BigInsights and Apache Hadoop, identifying architecture fit, and advising early stage projects in dozens of customer engagements.

Krishnan Parasuraman, B.Sc., M.Sc., is part of IBM's Big Data industry solutions team and serves as the CTO for Digital Media. In his role, Krishnan works very closely with customers in an advisory capacity, driving Big Data solution architectures and best practices for the management of Internet-scale analytics.

Thomas Deutsch, B.A, M.B.A., is a Program Director for IBM's Big Data team. He played a formative role in the transition of Hadoop-based technology from IBM Research to IBM Software Group and continues to be involved with IBM Research around Big Data.

David Corrigan, B.A., M.B.A., is currently the Director of Product Marketing for IBM's InfoSphere portfolio, which is focused on managing trusted information. His primary focus is driving the messaging and strategy for the InfoSphere portfolio of information integration, data quality, master data management (MDM), data lifecycle management, and data privacy and security.

James Giles, BSEE, B.Math, MSEE, Ph.D., is an IBM Distinguished Engineer and currently a Senior Development Manager for the IBM InfoSphere BigInsights and IBM InfoSphere Streams Big Data products.


Table of Contents

Forewordp. xvii
Prefacep. xxi
Acknowledgmentsp. xxv
About This Bookp. xxvii
Part I The Big Deal About Big Data
1 What Is Big Data?p. 3
Why Is Big Data Important?p. 3
Now, the "What Is Big Data?" Partp. 4
Brought to You by the Letter V: How We Define Big Datap. 9
What About My Data Warehouse in a Big Data World?p. 15
Wrapping It Upp. 19
2 Applying Big Data to Business Problems: A Sampling of Use Casesp. 21
When to Consider a Big Data Solutionp. 21
Before We Start: Big Data, Jigsaw Puzzles, and Insightp. 24
Big Data Use Cases: Patterns for Big Data Deploymentp. 26
You Spent the Money to Instrument It-Now Exploit It!p. 26
IT for IT: Data Center, Machine Data, and Log Analyticsp. 28
What, Why, and Who? Social Media: Analyticsp. 30
Understanding Customer Sentimentp. 31
Social Media Techniques Make the World Your Oysterp. 33
Customer State: Or, Don't Try to Upsell Me When I Am Madp. 34
Fraud Detection: "Who Buys an Engagement Ring at 4 a.m.?"p. 36
Liquidity and Risk: Moving from Aggregate to Individualp. 38
Wrapping It Upp. 39
3 Boost Your Big Data IQ: The IBM Big Data Platformp. 41
The New Era of Analyticsp. 41
Key Considerations for the Analytic Enterprisep. 43
The Big Data Platform Manifestop. 45
IBM's Strategy for Big Data and Analyticsp. 49
1 Sustained Investments in Research and Acquisitionsp. 49
2 Strong Commitment to Open Source Efforts and a Fostering of Ecosystem Developmentp. 50
3 Support Multiple Entry Points to Big Datap. 52
A Flexible, Platform-Based Approach to Big Datap. 56
Wrapping It Upp. 59
Part II Analytics for Big Data at Rest
4 A Big Data Platform for High-Performance Deep Analytics: IBM PureData Systemsp. 63
Netezza's Design Principlesp. 66
Appliance Simplicity: Minimize the Human Effortp. 66
Hardware Acceleration: Process Analytics Close to the Data Storep. 67
Balanced, Massively Parallel Architecture: Deliver Linear Scalabilityp. 67
Modular Design: Support Flexible Configurations and Extreme Scalabilityp. 67
What's in the Box? The Netezza Appliance Architecture Overviewp. 68
A Look Inside the Netezza Appliancep. 69
The Secret Sauce: FPGA-Assisted Analyticsp. 72
Query Orchestration in Netezzap. 73
Platform for Advanced Analyticsp. 77
Extending the Netezza Analytics Platform with Hadoopp. 79
Customers' Success Stories: The Netezza Experiencep. 81
T-Mobile: Delivering Extreme Performance with Simplicity at the Petabyte Scalep. 82
State University of New York: Using Analytics to Help Find a Cure for Multiple Sclerosisp. 83
NYSE Euronext: Reducing Data Latency and Enabling Rapid Ad-Hoc Searchesp. 84
5 IBM's Enterprise Hadoop: InfoSphere Biglnsightsp. 85
What the Hadoop!p. 87
Where Elephants Come From: The History of Hadoopp. 88
Components of Hadoop and Related Projectsp. 89
Hadoop 2.0p. 89
What's in the Box: The Components of InfoSphere Biglnsightsp. 90
Hadoop Components Included in InfoSphere Biglnsights 2.0p. 91
The Biglnsights Web Consolep. 92
The Biglnsights Development Toolsp. 93
Biglnsights Editions: Basic and Advancedp. 94
Deploying Biglnsightsp. 94
Ease of Use: A Simple Installation Processp. 94
A Low-Cost Way to Get Started: Running Biglnsights on the Cloudp. 95
Higher-Class Hardware: IBM PowerLinux Solution for Big Datap. 96
Cloudera Supportp. 96
Analytics: Exploration, Development, and Deploymentp. 97
Advanced Text Analytics Toolkitp. 98
Machine Learning for the Masses: Deep Statistical Analysis on Biglnsightsp. 99
Analytic Accelerators: Finding Needles in Haystacks of Needles?p. 99
Apps for the Masses: Easy Deployment and Execution of Custom Applicationsp. 100
Data Discovery and Visualization: BigSheetsp. 100
The Biglnsights Development Environmentp. 103
The Biglnsights Application Lifecyclep. 105
Data Integrationp. 106
The Anlaytics-Based IBM PureData Systems and DB2p. 107
JDBC Modulep. 108
InfoSphere Streams for Data in Motionp. 109
InfoSphere DataStagep. 109
Operational Excellencep. 110
Securing the Clusterp. 110
Monitoring All Aspects of Your Clusterp. 112
Compressionp. 113
Improved Workload Scheduling: Intelligent Schedulerp. 117
Adaptive MapReducep. 118
A Flexible File System for Hadoop: GPFS-FPOp. 120
Wrapping It Upp. 122
Part III Analytics for Big Data in Motion
6 Real-Time Analytical Processing with InfoSphere Streamsp. 127
The Basics: InfoSphere Streamsp. 128
How InfoSphere Streams Worksp. 132
What's a Lowercase "stream"?p. 132
Programming Streams Made Easyp. 135
The Streams Processing Languagep. 145
Source and Sink Adaptersp. 147
Operatorsp. 149
Streams Toolkitsp. 152
Enterprise Classp. 155
High Availabilityp. 155
Integration Is the Apex of Enterprise Class Analysisp. 157
Industry Use Cases for InfoSphere Streamsp. 158
Telecommunicationsp. 158
Enforcement, Defense, Surveillance, and Cyber Securityp. 159
Financial Services Sectorp. 160
Health and Life Sciencesp. 160
And the Rest We Can't Fit in This Bookp. 161
Wrapping It Upp. 162
Part IV Unlocking Big Data
7 If Data Is the New Oil-You Need Data Exploration and Discoveryp. 165
Indexing Data from Multiple Sources with InfoSphere Data Explorerp. 167
Connector Frameworkp. 167
The Data Explorer Processing Layerp. 169
User Management Layerp. 173
Beefing Up InfoSphere Biglnsightsp. 174
An App with a View: Creating Information Dashboards with InfoSphere Data Explorer Application Builderp. 175
Wrapping It Up: Data Explorer Unlocks Big Datap. 177
Part V Big Data Analytic Accelerators
8 Differentiate Yourself with Text Analyticsp. 181
What Is Text Analysis?p. 183
The Annotated Query Language to the Rescue!p. 184
Productivity Tools That Make All the Differencep. 188
Wrapping It Upp. 190
9 The IBM Big Data Analytic Acceleratorsp. 191
The IBM Accelerator for Machine Data Analyticsp. 192
Ingesting Machine Datap. 193
Extractp. 194
Indexp. 196
Transformp. 196
Statistical Modelingp. 197
Visualizationp. 197
Faceted Searchp. 198
The IBM Accelerator for Social Data Analyticsp. 198
Feedback Extractors: What Are People Saying?p. 200
Profile Extractors: Who Are These People?p. 200
Workflow: Pulling It All Togetherp. 201
The IBM Accelerator for Telecommunications Event Data Analyticsp. 203
Call Detail Record Enrichmentp. 205
Network Quality Monitoringp. 207
Customer Experience Indicatorsp. 207
Wrapping It Up: Accelerating Your Productivityp. 208
Part VI Integration and Governance in a Big Data World
10 To Govern or Not to Govern: Governance in a Big Data Worldp. 211
Why Should Big Data be Governed?p. 212
Competing on Information and Analyticsp. 214
The Definition of Information Integration and Governancep. 216
An Information Governance Processp. 217
The IBM Information Integration and Governance Technology Platformp. 220
IBM InfoSphere Business Information Exchangep. 221
IBM InfoSphere Information Serverp. 224
Data Qualityp. 228
Master Data Managementp. 229
Data Lifecycle Managementp. 230
Privacy and Securityp. 232
Wrapping It Up: Trust Is About Turning Big Data into Trusted Informationp. 234
11 Integrating Big Data in the Enterprisep. 235
Analytic Application Integrationp. 236
IBM Cognos Softwarep. 236
IBM Content Analytics with Enterprise Searchp. 237
SPSSp. 237
SASp. 238
Unicap. 238
Q1 Labs: Security Solutionsp. 238
IBM i2 Intelligence Analysis Platformp. 239
Platform Symphony MapReducep. 239
Component Integration Within the IBM Big Data Platformp. 240
InfoSphere Biglnsightsp. 240
InfoSphere Streamsp. 241
Data Warehouse Solutionsp. 241
The Advanced Text Analytics Toolkitp. 241
InfoSphere Data Explorerp. 242
InfoSphere Information Serverp. 242
InfoSphere Master Data Managementp. 243
InfoSphere Guardiump. 243
InfoSphere Optimp. 244
WebSphere Front Officep. 244
WebSphere Decision Server: iLog Rulep. 245
Rationalp. 245
Data Repository-Level Integrationp. 245
Enterprise Platform Plug-insp. 246
Development Toolingp. 246
Analyticsp. 246
Visualizationp. 246
Wrapping It Upp. 247