Knowledge discovery in multiple databases

Many organizations have an urgent need of mining their multiple databases inherently distributed in branches (distributed data). In particular, as the Web is rapidly becoming an information flood, individuals and organizations can take into account low-cost information and knowledge on the Internet when making decisions. How to efficiently identify quality knowledge from different data sources has become a significant challenge. This challenge has attracted a great many researchers including the au thors who have developed a local pattern analysis, a new strategy for dis covering some kinds of potentially useful patterns that cannot be mined in traditional multi-database mining techniques. Local pattern analysis deliv ers high-performance pattern discovery from multiple databases. There has been considerable progress made on multi-database mining in such areas as hierarchical meta-learning, collective mining, database classification, and pe culiarity discovery. While these techniques continue to be future topics of interest concerning multi-database mining, this book focuses on these inter esting issues under the framework of local pattern analysis. The book is intended for researchers and students in data mining, dis tributed data analysis, machine learning, and anyone else who is interested in multi-database mining. It is also appropriate for use as a text supplement for broader courses that might also involve knowledge discovery in databases and data mining.

1 Importance of Multi-database Mining	p. 1
1.1 Introduction	p. 1
1.2 Role of Multi-database Mining in Real-world Applications	p. 2
1.3 Multi-database Mining Problems	p. 4
1.4 Differences Between Mono-and Multi-database Mining	p. 6
1.4.1 Features of Data in Multi-databases	p. 6
1.4.2 Features of Patterns in Multi-databases	p. 8
1.5 Evolution of Multi-database Mining	p. 9
1.6 Limitations of Previous Techniques	p. 12
1.7 Process of Multi-database Mining	p. 14
1.7.1 Description of Multi-database Mining	p. 14
1.7.2 Practical Issues in the Process	p. 16
1.8 Features of the Defined Process	p. 20
1.9 Major Contributions of This Book	p. 23
1.10 Organization of the Book	p. 24
2 Data Mining and Multi-database Mining	p. 27
2.1 Introduction	p. 27
2.2 Knowledge Discovery in Databases	p. 28
2.2.1 Processing Steps of KDD	p. 28
2.2.2 Data Pre-processing	p. 30
2.2.3 Data Mining	p. 31
2.2.4 Post Data Mining	p. 33
2.2.5 Applications of KDD	p. 34
2.3 Association Rule Mining	p. 36
2.4 Research into Mining Mono-databases	p. 41
2.5 Research into Mining Multi-databases	p. 51
2.5.1 Parallel Data Mining	p. 51
2.5.2 Distributed Data Mining	p. 52
2.5.3 Application-dependent Database Selection	p. 58
2.5.4 Peculiarity-oriented Multi-database Mining	p. 59
2.6 Summary	p. 61
3 Local Pattern Analysis	p. 63
3.1 Introduction	p. 63
3.2 Previous Multi-database Mining Techniques	p. 64
3.3 Local Patterns	p. 65
3.4 Local Instance Analysis Inspired by Competition in Sports	p. 67
3.5 The Structure of Patterns in Multi-database Environments	p. 70
3.6 Effectiveness of Local Pattern Analysis	p. 73
3.7 Summary	p. 74
4 Identifying Quality Knowledge	p. 75
4.1 Introduction	p. 75
4.2 Problem Statement	p. 76
4.2.1 Problems Faced by Traditional Multi-database Mining	p. 76
4.2.2 Effectiveness of Identifying Quality Data	p. 78
4.2.3 Needed Concepts	p. 80
4.3 Nonstandard Interpretation	p. 82
4.4 Proof Theory	p. 88
4.5 Adding External Knowledge	p. 91
4.6 The Use of the Framework	p. 95
4.6.1 Applying to Real-world Applications	p. 95
4.6.2 Evaluating Veridicality	p. 96
4.7 Summary	p. 100
5 Database Clustering	p. 103
5.1 Introduction	p. 103
5.2 Effectiveness of Classifying	p. 104
5.3 Classifying Databases	p. 107
5.3.1 Features in Databases	p. 107
5.3.2 Similarity Measurement	p. 108
5.3.3 Relevance of Databases and Classification	p. 113
5.3.4 Ideal Classification and Goodness Measurement	p. 115
5.4 Searching for a Good Classification	p. 120
5.4.1 The First Step: Generating a Classification	p. 121
5.4.2 The Second Step: Searching for a Good Classification	p. 123
5.5 Algorithm Analysis	p. 127
5.5.1 Procedure GreedyClass	p. 127
5.5.2 Algorithm GoodClass	p. 129
5.6 Evaluation of Application-independent Database Classification	p. 130
5.6.1 Dataset Selection	p. 130
5.6.2 Experimental Results	p. 131
5.6.3 Analysis	p. 134
5.7 Summary	p. 135
6 Dealing with Inconsistency	p. 137
6.1 Introduction	p. 137
6.2 Problem Statement	p. 138
6.3 Definitions of Formal Semantics	p. 139
6.4 Weighted Majority	p. 143
6.5 Mastering Local Pattern Sets	p. 146
6.6 Examples of Synthesizing Local Pattern Sets	p. 148
6.7 A Syntactic Characterization	p. 150
6.8 Summary	p. 155
7 Identifying High-vote Patterns	p. 157
7.1 Introduction	p. 157
7.2 Illustration of High-vote Patterns	p. 158
7.3 Identifying High-vote Patterns	p. 161
7.4 Algorithm Design	p. 163
7.4.1 Searching for High-vote Patterns	p. 164
7.4.2 Identifying High-vote Patterns: An Example	p. 165
7.4.3 Algorithm Analysis	p. 167
7.5 Identifying High-vote Patterns Using a Fuzzy Logic Controller	p. 168
7.5.1 Needed Concepts in Fuzzy Logic	p. 168
7.5.2 System Analysis	p. 170
7.5.3 Setting Membership Functions for Input and Output Variables	p. 171
7.5.4 Setting Fuzzy Rules	p. 172
7.5.5 Fuzzification	p. 174
7.5.6 Inference and Rule Composition	p. 174
7.5.7 Defuzzification	p. 176
7.5.8 Algorithm Design	p. 177
7.6 High-vote Pattern Analysis	p. 178
7.6.1 Normal Distribution	p. 178
7.6.2 The Procedure of Clustering	p. 179
7.7 Suggested Patterns	p. 183
7.8 Summary	p. 183
8 Identifying Exceptional Patterns	p. 185
8.1 Introduction	p. 185
8.2 Interesting Exceptional Patterns	p. 186
8.2.1 Measuring the Interestingness	p. 186
8.2.2 Behavior of Interest Measurements	p. 189
8.3 Algorithm Design	p. 189
8.3.1 Algorithm Design	p. 189
8.3.2 Identifying Exceptions: An Example	p. 192
8.3.3 Algorithm Analysis	p. 193
8.4 Identifying Exceptions with a Fuzzy Logic Controller	p. 195
8.5 Summary	p. 195
9 Synthesizing Local Patterns by Weighting	p. 197
9.1 Introduction	p. 197
9.2 Problem Statement	p. 198
9.3 Synthesizing Rules by Weighting	p. 200
9.3.1 Weight of Evidence	p. 200
9.3.2 Solving Weights of Databases	p. 201
9.3.3 Algorithm Design	p. 205
9.4 Improvement of Synthesizing Model	p. 206
9.4.1 Effectiveness of Rule Selection	p. 206
9.4.2 Process of Rule Selection	p. 208
9.4.3 Optimized Algorithm	p. 210
9.5 Algorithm Analysis	p. 211
9.5.1 Procedure RuleSelection	p. 211
9.5.2 Algorithm RuleSynthesizing	p. 212
9.6 Summary	p. 213
10 Conclusions and Future Work	p. 215
10.1 Conclusions	p. 215
10.2 Future Work	p. 218
References	p. 221
Subject Index	p. 231

Available:*

On Order

Summary

Summary

Table of Contents