Cover image for Privacy-preserving data mining : models and algorithms
Title:
Privacy-preserving data mining : models and algorithms
Publication Information:
Berlin : Springer, 2008
Physical Description:
xxii, 513 p. : ill. ; 25 cm.
ISBN:
9780387709918

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010194070 QA76.9.D343 P74 2008 Open Access Book Book
Searching...

On Order

Summary

Summary

Advances in hardware technology have increased the capability to store and record personal data about consumers and individuals, causing concerns that personal data may be used for a variety of intrusive or malicious purposes.

Privacy-Preserving Data Mining: Models and Algorithms proposes a number of techniques to perform the data mining tasks in a privacy-preserving way. These techniques generally fall into the following categories: data modification techniques, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and inference control, query auditing methods, randomization and perturbation-based techniques.

This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions.

Privacy-Preserving Data Mining: Models and Algorithms is designed for researchers, professors, and advanced-level students in computer science, and is also suitable for industry practitioners.


Table of Contents

Charu C. Aggarwal and Philip S. YuCharu C. Aggarwal and Philip S. YuJosep Domingo-FerrerSuresh VenkatasubramanianV. Ciriani and S. De Capitani di Vimercati and S. Foresti and P. SamaratiCharu C. Aggarwal and Philip S. YuKeke Chen and Ling LiuElisa Bertino and Dan Lin and Wei JiangMing Hua and Jian PeiJayant R. HaritsaVassilios S. Verykios and Aris Gkoulalas-DivanisStephen E. Fienberg and Aleksandra B. SlavkovicMurat KantarciogluJaideep VaidyaKun Liu and Chris Giannella and Hillol KarguptaKobbi NissimShubha U. Nabar and Krishnaram Kenthapadi and Nina Mishra and Rajeev MotwaniCharu C. AggarwalYufei Tao and Xiaokui XiaoYabo Xu and Ke Wang and Ada Wai-Chee Fu and Rong She and Jian Pei
Prefacep. v
List of Figuresp. xvii
List of Tablesp. xxi
An Introduction to Privacy-Preserving Data Miningp. 1
1.1 Introductionp. 1
1.2 Privacy-Preserving Data Mining Algorithmsp. 3
1.3 Conclusions and Summaryp. 7
Referencesp. 8
2 A General Survey of Privacy-Preserving Data Mining Models and Algorithmsp. 11
2.1 Introductionp. 11
2.2 The Randomization Methodp. 13
2.2.1 Privacy Quantificationp. 15
2.2.2 Adversarial Attacks on Randomizationp. 18
2.2.3 Randomization Methods for Data Streamsp. 18
2.2.4 Multiplicative Perturbationsp. 19
2.2.5 Data Swappingp. 19
2.3 Group Based Anonymizationp. 20
2.3.1 The k-Anonymity Frameworkp. 20
2.3.2 Personalized Privacy-Preservationp. 24
2.3.3 Utility Based Privacy Preservationp. 24
2.3.4 Sequential Releasesp. 25
2.3.5 The l-diversity Methodp. 26
2.3.6 The t-closeness Modelp. 27
2.3.7 Models for Text, Binary and String Datap. 27
2.4 Distributed Privacy-Preserving Data Miningp. 28
2.4.1 Distributed Algorithms over Horizontally Partitioned Data Setsp. 30
2.4.2 Distributed Algorithms over Vertically Partitioned Datap. 31
2.4.3 Distributed Algorithms for k-Anonymityp. 32
2.5 Privacy-Preservation of Application Resultsp. 32
2.5.1 Association Rule Hidingp. 33
2.5.2 Downgrading Classifier Effectivenessp. 34
2.5.3 Query Auditing and Inference Controlp. 34
2.6 Limitations of Privacy: The Curse of Dimensionalityp. 37
2.7 Applications of Privacy-Preserving Data Miningp. 38
2.7.1 Medical Databases: The Scrub and Datafly Systemsp. 39
2.7.2 Bioterrorism Applicationsp. 40
2.7.3 Homeland Security Applicationsp. 40
2.7.4 Genomic Privacyp. 42
2.8 Summaryp. 43
Referencesp. 43
3 A Survey of Inference Control Methods for Privacy-Preserving Data Miningp. 53
3.1 Introductionp. 54
3.2 A classification of Microdata Protection Methodsp. 55
3.3 Perturbative Masking Methodsp. 58
3.3.1 Additive Noisep. 58
3.3.2 Microaggregationp. 59
3.3.3 Data Wapping and Rank Swappingp. 61
3.3.4 Roundingp. 62
3.3.5 Resamplingp. 62
3.3.6 PRAMp. 62
3.3.7 MASSCp. 63
3.4 Non-perturbative Masking Methodsp. 63
3.4.1 Samplingp. 64
3.4.2 Global Recodingp. 64
3.4.3 Top and Bottom Codingp. 65
3.4.4 Local Suppressionp. 65
3.5 Synthetic Microdata Generationp. 65
3.5.1 Synthetic Data by Multiple Imputationp. 65
3.5.2 Synthetic Data by Bootstrapp. 66
3.5.3 Synthetic Data by Latin Hypercube Samplingp. 66
3.5.4 Partially Synthetic Data by Cholesky Decompositionp. 67
3.5.5 Other Partially Synthetic and Hybrid Microdata Approachesp. 67
3.5.6 Pros and Cons of Synthetic Microdatap. 68
3.6 Trading off Information Loss and Disclosure Riskp. 69
3.6.1 Score Constructionp. 69
3.6.2 R-U Mapsp. 71
3.6.3 k-anonymityp. 71
3.7 Conclusions and Research Directionsp. 72
Referencesp. 73
4 Measures of Anonymityp. 81
4.1 Introductionp. 81
4.1.1 What is Privacy?p. 82
4.1.2 Data Anonymization Methodsp. 83
4.1.3 A Classification of Methodsp. 84
4.2 Statistical Measures of Anonymityp. 85
4.2.1 Query Restrictionp. 85
4.2.2 Anonymity via Variancep. 85
4.2.3 Anonymity via Multiplicityp. 86
4.3 Probabilistic Measures of Anonymityp. 87
4.3.1 Measures Based on Random Perturbationp. 87
4.3.2 Measures Based on Generalizationp. 90
4.3.3 Utility vs Privacyp. 94
4.4 Computational Measures of Anonymityp. 94
4.4.1 Anonymity via Isolationp. 97
4.5 Conclusions and New Directionsp. 97
4.5.1 New Directionsp. 98
Referencesp. 99
5 k-Anonymous Data Mining: A Surveyp. 105
5.1 Introductionp. 105
5.2 k-Anonymityp. 107
5.3 Algorithms for Enforcing k-Anonymityp. 110
5.4 k-Anonymity Threats from Data Miningp. 117
5.4.1 Association Rulesp. 118
5.4.2 Classification Miningp. 118
5.5 k-Anonymity in Data Miningp. 120
5.6 Anonymize-and-Minep. 123
5.7 Mine-and-Anonymizep. 126
5.7.1 Enforcing k-Anonymity on Association Rulesp. 126
5.7.2 Enforcing k-Anonymity on Decision Treesp. 130
5.8 Conclusionsp. 133
Acknowledgmentsp. 133
Referencesp. 134
6 A Survey of Randomization Methods for Privacy-Preserving Data Miningp. 137
6.1 Introductionp. 137
6.2 Reconstruction Methods for Randomizationp. 139
6.2.1 The Bayes Reconstruction Methodp. 139
6.2.2 The EM Reconstruction Methodp. 141
6.2.3 Utility and Optimality of Randomization Modelsp. 143
6.3 Applications of Randomizationp. 144
6.3.1 Privacy-Preserving Classification with Randomizationp. 144
6.3.2 Privacy-Preserving OLAPp. 145
6.3.3 Collaborative Filteringp. 145
6.4 The Privacy-Information Loss Tradeoffp. 146
6.5 Vulnerabilities of the Randomization Methodp. 149
6.6 Randomization of Time Series Data Streamsp. 151
6.7 Multiplicative Noise for Randomizationp. 152
6.7.1 Vulnerabilities of Multiplicative Randomizationp. 153
6.7.2 Sketch Based Randomizationp. 153
6.8 Conclusions and Summaryp. 154
Referencesp. 154
7 A Survey of Multiplicative Perturbation for Privacy-Preserving Data Miningp. 157
7.1 Introductionp. 158
7.1.1 Data Privacy vs. Data Utilityp. 159
7.1.2 Outlinep. 160
7.2 Definition of Multiplicative Perturbationp. 161
7.2.1 Notationsp. 161
7.2.2 Rotation Perturbationp. 161
7.2.3 Projection Perturbationp. 162
7.2.4 Sketch-based Approachp. 164
7.2.5 Geometric Perturbationp. 164
7.3 Transformation Invariant Data Mining Modelsp. 165
7.3.1 Definition of Transformation Invariant Modelsp. 166
7.3.2 Transformation-Invariant Classification Modelsp. 166
7.3.3 Transformation-Invariant Clustering Modelsp. 167
7.4 Privacy Evaluation for Multiplicative Perturbationp. 168
7.4.1 A Conceptual Multidimensional Privacy Evaluation Modelp. 168
7.4.2 Variance of Difference as Column Privacy Metricp. 169
7.4.3 Incorporating Attack Evaluationp. 170
7.4.4 Other Metricsp. 171
7.5 Attack Resilient Multiplicative Perturbationsp. 171
7.5.1 Naive Estimation to Rotation Perturbationp. 171
7.5.2 ICA-Based Attacksp. 173
7.5.3 Distance-Inference Attacksp. 174
7.5.4 Attacks with More Prior Knowledgep. 176
7.5.5 Finding Attack-Resilient Perturbationsp. 177
7.6 Conclusionp. 177
Acknowledgmentp. 178
Referencesp. 179
8 A Survey of Quantification of Privacy Preserving Data Mining Algorithmsp. 183
8.1 Introductionp. 184
8.2 Metrics for Quantifying Privacy Levelp. 186
8.2.1 Data Privacyp. 186
8.2.2 Result Privacyp. 191
8.3 Metrics for Quantifying Hiding Failurep. 192
8.4 Metrics for Quantifying Data Qualityp. 193
8.4.1 Quality of the Data Resulting from the PPDM Processp. 193
8.4.2 Quality of the Data Mining Resultsp. 198
8.5 Complexity Metricsp. 200
8.6 How to Select a Proper Metricp. 201
8.7 Conclusion and Research Directionsp. 202
Referencesp. 202
9 A Survey of Utility-based Privacy-Preserving Data Transformation Methodsp. 207
9.1 Introductionp. 208
9.1.1 What is Utility-based Privacy Preservation?p. 209
9.2 Types of Utility-based Privacy Preservation Methodsp. 210
9.2.1 Privacy Modelsp. 210
9.2.2 Utility Measuresp. 212
9.2.3 Summary of the Utility-Based Privacy Preserving Methodsp. 214
9.3 Utility-Based Anonymization Using Local Recodingp. 214
9.3.1 Global Recoding and Local Recodingp. 215
9.3.2 Utility Measurep. 216
9.3.3 Anonymization Methodsp. 217
9.3.4 Summary and Discussionp. 219
9.4 The Utility-based Privacy Preserving Methods in Classification Prob-lemsp. 219
9.4.1 The Top-Down Specialization Methodp. 220
9.4.2 The Progressive Disclosure Algorithmp. 224
9.4.3 Summary and Discussionp. 228
9.5 Anonymized Marginal: Injecting Utility into Anonymized Data Setsp. 228
9.5.1 Anonymized Marginalp. 229
9.5.2 Utility Measurep. 230
9.5.3 Injecting Utility Using Anonymized Marginalsp. 231
9.5.4 Summary and Discussionp. 233
9.6 Summaryp. 234
Acknowledgmentsp. 234
Referencesp. 234
10 Mining Association Rules under Privacy Constraintsp. 239
10.1 Introductionp. 239
10.2 Problem Frameworkp. 240
10.2.1 Database Modelp. 240
10.2.2 Mining Objectivep. 241
10.2.3 Privacy Mechanismsp. 241
10.2.4 Privacy Metricp. 243
10.2.5 Accuracy Metricp. 245
10.3 Evolution of the Literaturep. 246
10.4 The FRAPP Frameworkp. 251
10.4.1 Reconstruction Modelp. 252
10.4.2 Estimation Errorp. 253
10.4.3 Randomizing the Perturbation Matrixp. 256
10.4.4 Efficient Perturbationp. 256
10.4.5 Integration with Association Rule Miningp. 258
10.5 Sample Resultsp. 259
10.6 Closing Remarks 263 Acknowledgmentsp. 263
Referencesp. 263
11 A Survey of Association Rule Hiding Methods for Privacyp. 267
11.1 Introductionp. 267
11.2 Terminology and Preliminariesp. 269
11.3 Taxonomy of Association Rule Hiding Algorithmsp. 270
11.4 Classes of Association Rule Algorithmsp. 271
11.4.1 Heuristic Approachesp. 272
11.4.2 Border-based Approachesp. 277
11.4.3 Exact Approachesp. 278
11.5 Other Hiding Approachesp. 279
11.6 Metrics and Performance Analysisp. 281
11.7 Discussion and Future Trendsp. 284
11.8 Conclusions 285 Referencesp. 286
12 A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entriesp. 291
12.1 Introductionp. 291
12.2 The Statistical Approach Privacy Protectionp. 292
12.3 Datamining Algorithms, Association Rules, and Disclosure Limitationp. 294
12.4 Estimation and Disclosure Limitation for Multi-way Contingency Tablesp. 295
12.5 Two Illustrative Examplesp. 301
12.5.1 Example 1: Data from a Randomized Clinical Trialp. 301
12.5.2 Example 2: Data from the 1993 U.S. Current Population Surveyp. 305
12.6 Conclusionsp. 308
Acknowledgmentsp. 309
Referencesp. 309
13 A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Datap. 313
13.1 Introductionp. 313
13.2 Basic Cryptographic Techniques for Privacy-Preserving Distributed Data Miningp. 315
13.3 Common Secure Sub-protocols Used in Privacy-Preserving Distributed Data Miningp. 318
13.4 Privacy-preserving Distributed Data Mining on Horizontally Partitioned Datap. 323
13.5 Comparison to Vertically Partitioned Data Modelp. 326
13.6 Extension to Malicious Partiesp. 327
13.7 Limitations of the Cryptographic Techniques Used in Privacy-Preserving Distributed Data Miningp. 329
13.8 Privacy Issues Related to Data Mining Resultsp. 330
13.9 Conclusionp. 332
Referencesp. 332
14 A Survey of Privacy-Preserving Methods Across Vertically Partitioned Datap. 337
14.1 Introductionp. 337
14.2 Classificationp. 341
14.2.1 Naïve Bayes Classificationp. 342
14.2.2 Bayesian Network Structure Learningp. 343
14.2.3 Decision Tree Classificationp. 344
14.3 Clusteringp. 346
14.4 Association Rule Miningp. 347
14.5 Outlier detectionp. 349
14.5.1 Algorithmp. 351
14.5.2 Security Analysisp. 352
14.5.3 Computation and Communication Analysisp. 354
14.6 Challenges and Research Directionsp. 355
Referencesp. 356
15 A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methodsp. 359
15.1 Introductionp. 360
15.2 Definitions and Notationp. 360
15.3 Attacking Additive Data Perturbationp. 361
15.3.1 Eigen-Analysis and PCA Preliminariesp. 362
15.3.2 Spectral Filteringp. 363
15.3.3 SVD Filteringp. 364
15.3.4 PCA Filteringp. 365
15.3.5 MAP Estimation Attackp. 366
15.3.6 Distribution Analysis Attackp. 367
15.3.7 Summaryp. 367
15.4 Attacking Matrix Multiplicative Data Perturbationp. 369
15.4.1 Known I/O Attacksp. 370
15.4.2 Known Sample Attackp. 373
15.4.3 Other Attacks Based on ICAp. 374
15.4.4 Summaryp. 375
15.5 Attacking k-Anonymizationp. 376
15.6 Conclusion 376 Acknowledgments 377 Referencesp. 377
16 Private Data Analysis via Output Perturbationp. 383
16.1 Introductionp. 383
16.2 The Abstract Model - Statistical Databases, Queries, and Sanitizersp. 385
16.3 Privacyp. 388
16.3.1 Interpreting the Privacy Definitionp. 390
16.4 The Basic Technique: Calibrating Noise to Sensitivityp. 394
16.4.1 Applications: Functions with Low Global Sensitivityp. 396
16.5 Constructing Sanitizers for Complex Functionalitiesp. 400
16.5.1 k-Means Clusteringp. 401
16.5.2 SVD and PCAp. 403
16.5.3 Learning in the Statistical Queries Modelp. 404
16.6 Beyond the Basicsp. 405
16.6.1 Instance Based Noise and Smooth Sensitivityp. 406
16.6.2 The Sample-Aggregate Frameworkp. 408
16.6.3 A General Sanitization Mechanismp. 409
16.7 Related Work and Bibliographic Notesp. 409
Acknowledgmentsp. 411
Referencesp. 411
17 A Survey of Query Auditing Techniques for Data Privacyp. 415
17.1 Introductionp. 415
17.2 Auditing Aggregate Queriesp. 416
17.2.1 Offline Auditingp. 417
17.2.2 Online Auditingp. 418
17.3 Auditing Select-Project-Join Queriesp. 426
17.4 Challenges in Auditingp. 427
17.5 Readingp. 429
Referencesp. 430
18 Privacy and the Dimensionality Cursep. 433
18.1 Introductionp. 433
18.2 The Dimensionality Curse and the k-anonymity Methodp. 435
18.3 The Dimensionality Curse and Condensationp. 441
18.4 The Dimensionality Curse and the Randomization Methodp. 446
18.4.1 Effects of Public Informationp. 446
18.4.2 Effects of High Dimensionalityp. 450
18.4.3 Gaussian Perturbing Distributionp. 450
18.4.4 Uniform Perturbing Distributionp. 455
18.5 The Dimensionality Curse and l-diversityp. 458
18.6 Conclusions and Research Directionsp. 459
Referencesp. 460
19 Personalized Privacy Preservationp. 461
19.1 Introductionp. 461
19.2 Formalization of Personalized Anonymityp. 463
19.2.1 Personal Privacy Requirementsp. 464
19.2.2 Generalizationp. 465
19.3 Combinatorial Process of Privacy Attackp. 467
19.3.1 Primary Casep. 468
19.3.2 Non-primary Casep. 469
19.4 Theoretical Foundationp. 470
19.4.1 Notations and Basic Propertiesp. 471
19.4.2 Derivation of the Breach Probabilityp. 472
19.5 Generalization Algorithmp. 473
19.5.1 The Greedy Frameworkp. 474
19.5.2 Optimal SA-generalizationp. 476
19.6 Alternative Forms of Personalized Privacy Preservationp. 478
19.6.1 Extension of k-anonymityp. 479
19.6.2 Personalization in Location Privacy Protectionp. 480
19.7 Summary and Future Workp. 482
Referencesp. 485
20 Privacy-Preserving Data Stream Classificationp. 487
20.1 Introductionp. 487
20.1.1 Motivating Examplep. 488
20.1.2 Contributions and Paper Outlinep. 490
20.2 Related Worksp. 491
20.3 Problem Statementp. 493
20.3.1 Secure Join Stream Classificationp. 493
20.3.2 Naive Bayesian Classifiersp. 494
20.4 Our Approachp. 495
20.4.1 Initializationp. 495
20.4.2 Bottom-Up Propagationp. 496
20.4.3 Top-Down Propagationp. 497
20.4.4 Using NBCp. 499
20.4.5 Algorithm Analysisp. 500
20.5 Empirical Studiesp. 501
20.5.1 Real-life Datasetsp. 502
20.5.2 Synthetic Datasetsp. 504
20.5.3 Discussionp. 506
20.6 Conclusionsp. 507
Referencesp. 508
Indexp. 511