Introduction to information retrieval

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010282977	QA76.9.T48 M36 2008	Open Access Book	Book	Searching... Unknown
Searching... PSZ KL	30000010293616	QA76.9.T48 M36 2008	Open Access Book	Book	Searching... Unknown

Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike.

Author Notes

Christopher D. Manning is Associate Professor of Computer Science and Linguistics at Stanford University
Prabhakar Raghavan is Head of Yahoo! Research and a Consulting Professor of Computer Science at Stanford University
Hinrich Schutze is Chair of Theoretical Computational Linguistics at the Institute for Natural Language Processing, University of Stuttgart

Reviews 1

Choice Review

This recent book on information retrieval (IR) is a timely one: no serious IR work has been published in years, perhaps decades, even though one would expect a flood--as the fundamental theory behind search, it is a very hot topic today. Manning (Stanford Univ.), Raghavan (Yahoo! Inc.), and Schutze (Univ. of Stuttgart, Germany) cover most of the important issues in IR very well, and offer more than enough coverage for a one-semester introductory IR course. Somewhat lacking is any serious discussion of non-text IR (images, sound, video, multimedia in general). This is an important, up-and-coming topic, and one that has not been adequately addressed in the literature. On the one hand, this introductory volume may not be the appropriate vehicle for such an advanced topic. On the other hand, while this reviewer was teaching an introductory course, he still allocated some time to give an overview of what he thinks is a major future challenge. Thus, it would have been nice to have this subject addressed here. But overall, the book serves its intended purpose quite well, and this reviewer plans to use it again in his next IR course. Summing Up: Highly recommended. Professional and academic collections, upper-division undergraduate and above. H. Levkowitz University of Massachusetts

Table of Notation	p. xi
Preface	p. xv
1 Boolean retrieval	p. 1
1.1 An example information retrieval problem	p. 3
1.2 A first take at building an inverted index	p. 6
1.3 Processing Boolean queries	p. 9
1.4 The extended Boolean model versus ranked retrieval	p. 13
1.5 References and further reading	p. 16
2 The term vocabulary and postings lists	p. 18
2.1 Document delineation and character sequence decoding	p. 18
2.2 Determining the vocabulary of terms	p. 21
2.3 Faster postings list intersection via skip pointers	p. 33
2.4 Positional postings and phrase queries	p. 36
2.5 References and further reading	p. 43
3 Dictionaries and tolerant retrieval	p. 45
3.1 Search structures for dictionaries	p. 45
3.2 Wildcard queries	p. 48
3.3 Spelling correction	p. 52
3.4 Phonetic correction	p. 58
3.5 References and further reading	p. 59
4 Index construction	p. 61
4.1 Hardware basics	p. 62
4.2 Blocked sort-based indexing	p. 63
4.3 Single-pass in-memory indexing	p. 66
4.4 Distributed indexing	p. 68
4.5 Dynamic indexing	p. 71
4.6 Other types of indexes	p. 73
4.7 References and further reading	p. 76
5 Index compression	p. 78
5.1 Statistical properties of terms in information retrieval	p. 79
5.2 Dictionary compression	p. 82
5.3 Postings file compression	p. 87
5.4 References and further reading	p. 97
6 Scoring, term weighting, and the vector space model	p. 100
6.1 Parametric and zone indexes	p. 101
6.2 Term frequency and weighting	p. 107
6.3 The vector space model for scoring	p. 110
6.4 Variant tf-idf functions	p. 116
6.5 References and further reading	p. 122
7 Computing scores in a complete search system	p. 124
7.1 Efficient scoring and ranking	p. 124
7.2 Components of an information retrieval system	p. 132
7.3 Vector space scoring and query operator interaction	p. 136
7.4 References and further reading	p. 137
8 Evaluation in information retrieval	p. 139
8.1 Information retrieval system evaluation	p. 140
8.2 Standard test collections	p. 141
8.3 Evaluation of unranked retrieval sets	p. 142
8.4 Evaluation of ranked retrieval results	p. 145
8.5 Assessing relevance	p. 151
8.6 A broader perspective: System quality and user utility	p. 154
8.7 Results snippets	p. 157
8.8 References and further reading	p. 159
9 Relevance feedback and query expansion	p. 162
9.1 Relevance feedback and pseudo relevance feedback	p. 163
9.2 Global methods for query reformulation	p. 173
9.3 References and further reading	p. 177
10 XML retrieval	p. 178
10.1 Basic XML concepts	p. 180
10.2 Challenges in XML retrieval	p. 183
10.3 A vector space model for XML retrieval	p. 188
10.4 Evaluation of XML retrieval	p. 192
10.5 Text-centric versus data-centric XML retrieval	p. 196
10.6 References and further reading	p. 198
11 Probabilistic information retrieval	p. 201
11.1 Review of basic probability theory	p. 202
11.2 The probability ranking principle	p. 203
11.3 The binary independence model	p. 204
11.4 An appraisal and some extensions	p. 212
11.5 References and further reading	p. 216
12 Language models for information retrieval	p. 218
12.1 Language models	p. 218
12.2 The query likelihood model	p. 223
12.3 Language modeling versus other approaches in information retrieval	p. 229
12.4 Extended language modeling approaches	p. 230
12.5 References and further reading	p. 232
13 Text classification and Naive Bayes	p. 234
13.1 The text classification problem	p. 237
13.2 Naive Bayes text classification	p. 238
13.3 The Bernoulli model	p. 243
13.4 Properties of Naive Bayes	p. 245
13.5 Feature selection	p. 251
13.6 Evaluation of text classification	p. 258
13.7 References and further reading	p. 264
14 Vector space classification	p. 266
14.1 Document representations and measures of relatedness in vector spaces	p. 267
14.2 Rocchio classification	p. 269
14.3 k nearest neighbor	p. 273
14.4 Linear versus nonlinear classifiers	p. 277
14.5 Classification with more than two classes	p. 281
14.6 The bias-variance tradeoff	p. 284
14.7 References and further reading	p. 291
15 Support vector machines and machine learning on documents	p. 293
15.1 Support vector machines: The linearly separable case	p. 294
15.2 Extensions to the support vector machine model	p. 300
15.3 Issues in the classification of text documents	p. 307
15.4 Machine-learning methods in ad hoc information retrieval	p. 314
15.5 References and further reading	p. 318
16 Flat clustering	p. 321
16.1 Clustering in information retrieval	p. 322
16.2 Problem statement	p. 326
16.3 Evaluation of clustering	p. 327
16.4 K-means	p. 331
16.5 Model-based clustering	p. 338
16.6 References and further reading	p. 343
17 Hierarchical clustering	p. 346
17.1 Hierarchical agglomerative clustering	p. 347
17.2 Single-link and complete-link clustering	p. 350
17.3 Group-average agglomerative clustering	p. 356
17.4 Centroid clustering	p. 358
17.5 Optimality of hierarchical agglomerative clustering	p. 360
17.6 Divisive clustering	p. 362
17.7 Cluster labeling	p. 363
17.8 Implementation notes	p. 365
17.9 References and further reading	p. 367
18 Matrix decompositions and latent semantic indexing	p. 369
18.1 Linear algebra review	p. 369
18.2 Term-document matrices and singular value decompositions	p. 373
18.3 Low-rank approximations	p. 376
18.4 Latent semantic indexing	p. 378
18.5 References and further reading	p. 383
19 Web search basics	p. 385
19.1 Background and history	p. 385
19.2 Web characteristics	p. 387
19.3 Advertising as the economic model	p. 392
19.4 The search user experience	p. 395
19.5 Index size and estimation	p. 396
19.6 Near-duplicates and shingling	p. 400
19.7 References and further reading	p. 404
20 Web crawling and indexes	p. 405
20.1 Overview	p. 405
20.2 Crawling	p. 406
20.3 Distributing indexes	p. 415
20.4 Connectivity servers	p. 416
20.5 References and further reading	p. 419
21 Link analysis	p. 421
21.1 The Web as a graph	p. 422
21.2 PageRank	p. 424
21.3 Hubs and authorities	p. 433
21.4 References and further reading	p. 439
Bibliography	p. 441
Index	p. 469

Available:*

On Order

Summary

Summary

Author Notes

Reviews 1

Choice Review

Table of Contents