Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010194070 | QA76.9.D343 P74 2008 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Advances in hardware technology have increased the capability to store and record personal data about consumers and individuals, causing concerns that personal data may be used for a variety of intrusive or malicious purposes.
Privacy-Preserving Data Mining: Models and Algorithms proposes a number of techniques to perform the data mining tasks in a privacy-preserving way. These techniques generally fall into the following categories: data modification techniques, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and inference control, query auditing methods, randomization and perturbation-based techniques.
This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions.
Privacy-Preserving Data Mining: Models and Algorithms is designed for researchers, professors, and advanced-level students in computer science, and is also suitable for industry practitioners.
Table of Contents
Preface | p. v |
List of Figures | p. xvii |
List of Tables | p. xxi |
An Introduction to Privacy-Preserving Data Mining | p. 1 |
1.1 Introduction | p. 1 |
1.2 Privacy-Preserving Data Mining Algorithms | p. 3 |
1.3 Conclusions and Summary | p. 7 |
References | p. 8 |
2 A General Survey of Privacy-Preserving Data Mining Models and Algorithms | p. 11 |
2.1 Introduction | p. 11 |
2.2 The Randomization Method | p. 13 |
2.2.1 Privacy Quantification | p. 15 |
2.2.2 Adversarial Attacks on Randomization | p. 18 |
2.2.3 Randomization Methods for Data Streams | p. 18 |
2.2.4 Multiplicative Perturbations | p. 19 |
2.2.5 Data Swapping | p. 19 |
2.3 Group Based Anonymization | p. 20 |
2.3.1 The k-Anonymity Framework | p. 20 |
2.3.2 Personalized Privacy-Preservation | p. 24 |
2.3.3 Utility Based Privacy Preservation | p. 24 |
2.3.4 Sequential Releases | p. 25 |
2.3.5 The l-diversity Method | p. 26 |
2.3.6 The t-closeness Model | p. 27 |
2.3.7 Models for Text, Binary and String Data | p. 27 |
2.4 Distributed Privacy-Preserving Data Mining | p. 28 |
2.4.1 Distributed Algorithms over Horizontally Partitioned Data Sets | p. 30 |
2.4.2 Distributed Algorithms over Vertically Partitioned Data | p. 31 |
2.4.3 Distributed Algorithms for k-Anonymity | p. 32 |
2.5 Privacy-Preservation of Application Results | p. 32 |
2.5.1 Association Rule Hiding | p. 33 |
2.5.2 Downgrading Classifier Effectiveness | p. 34 |
2.5.3 Query Auditing and Inference Control | p. 34 |
2.6 Limitations of Privacy: The Curse of Dimensionality | p. 37 |
2.7 Applications of Privacy-Preserving Data Mining | p. 38 |
2.7.1 Medical Databases: The Scrub and Datafly Systems | p. 39 |
2.7.2 Bioterrorism Applications | p. 40 |
2.7.3 Homeland Security Applications | p. 40 |
2.7.4 Genomic Privacy | p. 42 |
2.8 Summary | p. 43 |
References | p. 43 |
3 A Survey of Inference Control Methods for Privacy-Preserving Data Mining | p. 53 |
3.1 Introduction | p. 54 |
3.2 A classification of Microdata Protection Methods | p. 55 |
3.3 Perturbative Masking Methods | p. 58 |
3.3.1 Additive Noise | p. 58 |
3.3.2 Microaggregation | p. 59 |
3.3.3 Data Wapping and Rank Swapping | p. 61 |
3.3.4 Rounding | p. 62 |
3.3.5 Resampling | p. 62 |
3.3.6 PRAM | p. 62 |
3.3.7 MASSC | p. 63 |
3.4 Non-perturbative Masking Methods | p. 63 |
3.4.1 Sampling | p. 64 |
3.4.2 Global Recoding | p. 64 |
3.4.3 Top and Bottom Coding | p. 65 |
3.4.4 Local Suppression | p. 65 |
3.5 Synthetic Microdata Generation | p. 65 |
3.5.1 Synthetic Data by Multiple Imputation | p. 65 |
3.5.2 Synthetic Data by Bootstrap | p. 66 |
3.5.3 Synthetic Data by Latin Hypercube Sampling | p. 66 |
3.5.4 Partially Synthetic Data by Cholesky Decomposition | p. 67 |
3.5.5 Other Partially Synthetic and Hybrid Microdata Approaches | p. 67 |
3.5.6 Pros and Cons of Synthetic Microdata | p. 68 |
3.6 Trading off Information Loss and Disclosure Risk | p. 69 |
3.6.1 Score Construction | p. 69 |
3.6.2 R-U Maps | p. 71 |
3.6.3 k-anonymity | p. 71 |
3.7 Conclusions and Research Directions | p. 72 |
References | p. 73 |
4 Measures of Anonymity | p. 81 |
4.1 Introduction | p. 81 |
4.1.1 What is Privacy? | p. 82 |
4.1.2 Data Anonymization Methods | p. 83 |
4.1.3 A Classification of Methods | p. 84 |
4.2 Statistical Measures of Anonymity | p. 85 |
4.2.1 Query Restriction | p. 85 |
4.2.2 Anonymity via Variance | p. 85 |
4.2.3 Anonymity via Multiplicity | p. 86 |
4.3 Probabilistic Measures of Anonymity | p. 87 |
4.3.1 Measures Based on Random Perturbation | p. 87 |
4.3.2 Measures Based on Generalization | p. 90 |
4.3.3 Utility vs Privacy | p. 94 |
4.4 Computational Measures of Anonymity | p. 94 |
4.4.1 Anonymity via Isolation | p. 97 |
4.5 Conclusions and New Directions | p. 97 |
4.5.1 New Directions | p. 98 |
References | p. 99 |
5 k-Anonymous Data Mining: A Survey | p. 105 |
5.1 Introduction | p. 105 |
5.2 k-Anonymity | p. 107 |
5.3 Algorithms for Enforcing k-Anonymity | p. 110 |
5.4 k-Anonymity Threats from Data Mining | p. 117 |
5.4.1 Association Rules | p. 118 |
5.4.2 Classification Mining | p. 118 |
5.5 k-Anonymity in Data Mining | p. 120 |
5.6 Anonymize-and-Mine | p. 123 |
5.7 Mine-and-Anonymize | p. 126 |
5.7.1 Enforcing k-Anonymity on Association Rules | p. 126 |
5.7.2 Enforcing k-Anonymity on Decision Trees | p. 130 |
5.8 Conclusions | p. 133 |
Acknowledgments | p. 133 |
References | p. 134 |
6 A Survey of Randomization Methods for Privacy-Preserving Data Mining | p. 137 |
6.1 Introduction | p. 137 |
6.2 Reconstruction Methods for Randomization | p. 139 |
6.2.1 The Bayes Reconstruction Method | p. 139 |
6.2.2 The EM Reconstruction Method | p. 141 |
6.2.3 Utility and Optimality of Randomization Models | p. 143 |
6.3 Applications of Randomization | p. 144 |
6.3.1 Privacy-Preserving Classification with Randomization | p. 144 |
6.3.2 Privacy-Preserving OLAP | p. 145 |
6.3.3 Collaborative Filtering | p. 145 |
6.4 The Privacy-Information Loss Tradeoff | p. 146 |
6.5 Vulnerabilities of the Randomization Method | p. 149 |
6.6 Randomization of Time Series Data Streams | p. 151 |
6.7 Multiplicative Noise for Randomization | p. 152 |
6.7.1 Vulnerabilities of Multiplicative Randomization | p. 153 |
6.7.2 Sketch Based Randomization | p. 153 |
6.8 Conclusions and Summary | p. 154 |
References | p. 154 |
7 A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining | p. 157 |
7.1 Introduction | p. 158 |
7.1.1 Data Privacy vs. Data Utility | p. 159 |
7.1.2 Outline | p. 160 |
7.2 Definition of Multiplicative Perturbation | p. 161 |
7.2.1 Notations | p. 161 |
7.2.2 Rotation Perturbation | p. 161 |
7.2.3 Projection Perturbation | p. 162 |
7.2.4 Sketch-based Approach | p. 164 |
7.2.5 Geometric Perturbation | p. 164 |
7.3 Transformation Invariant Data Mining Models | p. 165 |
7.3.1 Definition of Transformation Invariant Models | p. 166 |
7.3.2 Transformation-Invariant Classification Models | p. 166 |
7.3.3 Transformation-Invariant Clustering Models | p. 167 |
7.4 Privacy Evaluation for Multiplicative Perturbation | p. 168 |
7.4.1 A Conceptual Multidimensional Privacy Evaluation Model | p. 168 |
7.4.2 Variance of Difference as Column Privacy Metric | p. 169 |
7.4.3 Incorporating Attack Evaluation | p. 170 |
7.4.4 Other Metrics | p. 171 |
7.5 Attack Resilient Multiplicative Perturbations | p. 171 |
7.5.1 Naive Estimation to Rotation Perturbation | p. 171 |
7.5.2 ICA-Based Attacks | p. 173 |
7.5.3 Distance-Inference Attacks | p. 174 |
7.5.4 Attacks with More Prior Knowledge | p. 176 |
7.5.5 Finding Attack-Resilient Perturbations | p. 177 |
7.6 Conclusion | p. 177 |
Acknowledgment | p. 178 |
References | p. 179 |
8 A Survey of Quantification of Privacy Preserving Data Mining Algorithms | p. 183 |
8.1 Introduction | p. 184 |
8.2 Metrics for Quantifying Privacy Level | p. 186 |
8.2.1 Data Privacy | p. 186 |
8.2.2 Result Privacy | p. 191 |
8.3 Metrics for Quantifying Hiding Failure | p. 192 |
8.4 Metrics for Quantifying Data Quality | p. 193 |
8.4.1 Quality of the Data Resulting from the PPDM Process | p. 193 |
8.4.2 Quality of the Data Mining Results | p. 198 |
8.5 Complexity Metrics | p. 200 |
8.6 How to Select a Proper Metric | p. 201 |
8.7 Conclusion and Research Directions | p. 202 |
References | p. 202 |
9 A Survey of Utility-based Privacy-Preserving Data Transformation Methods | p. 207 |
9.1 Introduction | p. 208 |
9.1.1 What is Utility-based Privacy Preservation? | p. 209 |
9.2 Types of Utility-based Privacy Preservation Methods | p. 210 |
9.2.1 Privacy Models | p. 210 |
9.2.2 Utility Measures | p. 212 |
9.2.3 Summary of the Utility-Based Privacy Preserving Methods | p. 214 |
9.3 Utility-Based Anonymization Using Local Recoding | p. 214 |
9.3.1 Global Recoding and Local Recoding | p. 215 |
9.3.2 Utility Measure | p. 216 |
9.3.3 Anonymization Methods | p. 217 |
9.3.4 Summary and Discussion | p. 219 |
9.4 The Utility-based Privacy Preserving Methods in Classification Prob-lems | p. 219 |
9.4.1 The Top-Down Specialization Method | p. 220 |
9.4.2 The Progressive Disclosure Algorithm | p. 224 |
9.4.3 Summary and Discussion | p. 228 |
9.5 Anonymized Marginal: Injecting Utility into Anonymized Data Sets | p. 228 |
9.5.1 Anonymized Marginal | p. 229 |
9.5.2 Utility Measure | p. 230 |
9.5.3 Injecting Utility Using Anonymized Marginals | p. 231 |
9.5.4 Summary and Discussion | p. 233 |
9.6 Summary | p. 234 |
Acknowledgments | p. 234 |
References | p. 234 |
10 Mining Association Rules under Privacy Constraints | p. 239 |
10.1 Introduction | p. 239 |
10.2 Problem Framework | p. 240 |
10.2.1 Database Model | p. 240 |
10.2.2 Mining Objective | p. 241 |
10.2.3 Privacy Mechanisms | p. 241 |
10.2.4 Privacy Metric | p. 243 |
10.2.5 Accuracy Metric | p. 245 |
10.3 Evolution of the Literature | p. 246 |
10.4 The FRAPP Framework | p. 251 |
10.4.1 Reconstruction Model | p. 252 |
10.4.2 Estimation Error | p. 253 |
10.4.3 Randomizing the Perturbation Matrix | p. 256 |
10.4.4 Efficient Perturbation | p. 256 |
10.4.5 Integration with Association Rule Mining | p. 258 |
10.5 Sample Results | p. 259 |
10.6 Closing Remarks 263 Acknowledgments | p. 263 |
References | p. 263 |
11 A Survey of Association Rule Hiding Methods for Privacy | p. 267 |
11.1 Introduction | p. 267 |
11.2 Terminology and Preliminaries | p. 269 |
11.3 Taxonomy of Association Rule Hiding Algorithms | p. 270 |
11.4 Classes of Association Rule Algorithms | p. 271 |
11.4.1 Heuristic Approaches | p. 272 |
11.4.2 Border-based Approaches | p. 277 |
11.4.3 Exact Approaches | p. 278 |
11.5 Other Hiding Approaches | p. 279 |
11.6 Metrics and Performance Analysis | p. 281 |
11.7 Discussion and Future Trends | p. 284 |
11.8 Conclusions 285 References | p. 286 |
12 A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries | p. 291 |
12.1 Introduction | p. 291 |
12.2 The Statistical Approach Privacy Protection | p. 292 |
12.3 Datamining Algorithms, Association Rules, and Disclosure Limitation | p. 294 |
12.4 Estimation and Disclosure Limitation for Multi-way Contingency Tables | p. 295 |
12.5 Two Illustrative Examples | p. 301 |
12.5.1 Example 1: Data from a Randomized Clinical Trial | p. 301 |
12.5.2 Example 2: Data from the 1993 U.S. Current Population Survey | p. 305 |
12.6 Conclusions | p. 308 |
Acknowledgments | p. 309 |
References | p. 309 |
13 A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Data | p. 313 |
13.1 Introduction | p. 313 |
13.2 Basic Cryptographic Techniques for Privacy-Preserving Distributed Data Mining | p. 315 |
13.3 Common Secure Sub-protocols Used in Privacy-Preserving Distributed Data Mining | p. 318 |
13.4 Privacy-preserving Distributed Data Mining on Horizontally Partitioned Data | p. 323 |
13.5 Comparison to Vertically Partitioned Data Model | p. 326 |
13.6 Extension to Malicious Parties | p. 327 |
13.7 Limitations of the Cryptographic Techniques Used in Privacy-Preserving Distributed Data Mining | p. 329 |
13.8 Privacy Issues Related to Data Mining Results | p. 330 |
13.9 Conclusion | p. 332 |
References | p. 332 |
14 A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data | p. 337 |
14.1 Introduction | p. 337 |
14.2 Classification | p. 341 |
14.2.1 Naïve Bayes Classification | p. 342 |
14.2.2 Bayesian Network Structure Learning | p. 343 |
14.2.3 Decision Tree Classification | p. 344 |
14.3 Clustering | p. 346 |
14.4 Association Rule Mining | p. 347 |
14.5 Outlier detection | p. 349 |
14.5.1 Algorithm | p. 351 |
14.5.2 Security Analysis | p. 352 |
14.5.3 Computation and Communication Analysis | p. 354 |
14.6 Challenges and Research Directions | p. 355 |
References | p. 356 |
15 A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods | p. 359 |
15.1 Introduction | p. 360 |
15.2 Definitions and Notation | p. 360 |
15.3 Attacking Additive Data Perturbation | p. 361 |
15.3.1 Eigen-Analysis and PCA Preliminaries | p. 362 |
15.3.2 Spectral Filtering | p. 363 |
15.3.3 SVD Filtering | p. 364 |
15.3.4 PCA Filtering | p. 365 |
15.3.5 MAP Estimation Attack | p. 366 |
15.3.6 Distribution Analysis Attack | p. 367 |
15.3.7 Summary | p. 367 |
15.4 Attacking Matrix Multiplicative Data Perturbation | p. 369 |
15.4.1 Known I/O Attacks | p. 370 |
15.4.2 Known Sample Attack | p. 373 |
15.4.3 Other Attacks Based on ICA | p. 374 |
15.4.4 Summary | p. 375 |
15.5 Attacking k-Anonymization | p. 376 |
15.6 Conclusion 376 Acknowledgments 377 References | p. 377 |
16 Private Data Analysis via Output Perturbation | p. 383 |
16.1 Introduction | p. 383 |
16.2 The Abstract Model - Statistical Databases, Queries, and Sanitizers | p. 385 |
16.3 Privacy | p. 388 |
16.3.1 Interpreting the Privacy Definition | p. 390 |
16.4 The Basic Technique: Calibrating Noise to Sensitivity | p. 394 |
16.4.1 Applications: Functions with Low Global Sensitivity | p. 396 |
16.5 Constructing Sanitizers for Complex Functionalities | p. 400 |
16.5.1 k-Means Clustering | p. 401 |
16.5.2 SVD and PCA | p. 403 |
16.5.3 Learning in the Statistical Queries Model | p. 404 |
16.6 Beyond the Basics | p. 405 |
16.6.1 Instance Based Noise and Smooth Sensitivity | p. 406 |
16.6.2 The Sample-Aggregate Framework | p. 408 |
16.6.3 A General Sanitization Mechanism | p. 409 |
16.7 Related Work and Bibliographic Notes | p. 409 |
Acknowledgments | p. 411 |
References | p. 411 |
17 A Survey of Query Auditing Techniques for Data Privacy | p. 415 |
17.1 Introduction | p. 415 |
17.2 Auditing Aggregate Queries | p. 416 |
17.2.1 Offline Auditing | p. 417 |
17.2.2 Online Auditing | p. 418 |
17.3 Auditing Select-Project-Join Queries | p. 426 |
17.4 Challenges in Auditing | p. 427 |
17.5 Reading | p. 429 |
References | p. 430 |
18 Privacy and the Dimensionality Curse | p. 433 |
18.1 Introduction | p. 433 |
18.2 The Dimensionality Curse and the k-anonymity Method | p. 435 |
18.3 The Dimensionality Curse and Condensation | p. 441 |
18.4 The Dimensionality Curse and the Randomization Method | p. 446 |
18.4.1 Effects of Public Information | p. 446 |
18.4.2 Effects of High Dimensionality | p. 450 |
18.4.3 Gaussian Perturbing Distribution | p. 450 |
18.4.4 Uniform Perturbing Distribution | p. 455 |
18.5 The Dimensionality Curse and l-diversity | p. 458 |
18.6 Conclusions and Research Directions | p. 459 |
References | p. 460 |
19 Personalized Privacy Preservation | p. 461 |
19.1 Introduction | p. 461 |
19.2 Formalization of Personalized Anonymity | p. 463 |
19.2.1 Personal Privacy Requirements | p. 464 |
19.2.2 Generalization | p. 465 |
19.3 Combinatorial Process of Privacy Attack | p. 467 |
19.3.1 Primary Case | p. 468 |
19.3.2 Non-primary Case | p. 469 |
19.4 Theoretical Foundation | p. 470 |
19.4.1 Notations and Basic Properties | p. 471 |
19.4.2 Derivation of the Breach Probability | p. 472 |
19.5 Generalization Algorithm | p. 473 |
19.5.1 The Greedy Framework | p. 474 |
19.5.2 Optimal SA-generalization | p. 476 |
19.6 Alternative Forms of Personalized Privacy Preservation | p. 478 |
19.6.1 Extension of k-anonymity | p. 479 |
19.6.2 Personalization in Location Privacy Protection | p. 480 |
19.7 Summary and Future Work | p. 482 |
References | p. 485 |
20 Privacy-Preserving Data Stream Classification | p. 487 |
20.1 Introduction | p. 487 |
20.1.1 Motivating Example | p. 488 |
20.1.2 Contributions and Paper Outline | p. 490 |
20.2 Related Works | p. 491 |
20.3 Problem Statement | p. 493 |
20.3.1 Secure Join Stream Classification | p. 493 |
20.3.2 Naive Bayesian Classifiers | p. 494 |
20.4 Our Approach | p. 495 |
20.4.1 Initialization | p. 495 |
20.4.2 Bottom-Up Propagation | p. 496 |
20.4.3 Top-Down Propagation | p. 497 |
20.4.4 Using NBC | p. 499 |
20.4.5 Algorithm Analysis | p. 500 |
20.5 Empirical Studies | p. 501 |
20.5.1 Real-life Datasets | p. 502 |
20.5.2 Synthetic Datasets | p. 504 |
20.5.3 Discussion | p. 506 |
20.6 Conclusions | p. 507 |
References | p. 508 |
Index | p. 511 |