Cover image for Introduction to information retrieval
Title:
Introduction to information retrieval
Personal Author:
Publication Information:
New York : Cambridge University Press, c2008
Physical Description:
xxi, 482 p. : ill. ; 27 cm.
ISBN:
9780521865715

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010282977 QA76.9.T48 M36 2008 Open Access Book Book
Searching...
Searching...
30000010293616 QA76.9.T48 M36 2008 Open Access Book Book
Searching...

On Order

Summary

Summary

Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike.


Author Notes

Christopher D. Manning is Associate Professor of Computer Science and Linguistics at Stanford University
Prabhakar Raghavan is Head of Yahoo! Research and a Consulting Professor of Computer Science at Stanford University
Hinrich Schutze is Chair of Theoretical Computational Linguistics at the Institute for Natural Language Processing, University of Stuttgart


Reviews 1

Choice Review

This recent book on information retrieval (IR) is a timely one: no serious IR work has been published in years, perhaps decades, even though one would expect a flood--as the fundamental theory behind search, it is a very hot topic today. Manning (Stanford Univ.), Raghavan (Yahoo! Inc.), and Schutze (Univ. of Stuttgart, Germany) cover most of the important issues in IR very well, and offer more than enough coverage for a one-semester introductory IR course. Somewhat lacking is any serious discussion of non-text IR (images, sound, video, multimedia in general). This is an important, up-and-coming topic, and one that has not been adequately addressed in the literature. On the one hand, this introductory volume may not be the appropriate vehicle for such an advanced topic. On the other hand, while this reviewer was teaching an introductory course, he still allocated some time to give an overview of what he thinks is a major future challenge. Thus, it would have been nice to have this subject addressed here. But overall, the book serves its intended purpose quite well, and this reviewer plans to use it again in his next IR course. Summing Up: Highly recommended. Professional and academic collections, upper-division undergraduate and above. H. Levkowitz University of Massachusetts


Table of Contents

Table of Notationp. xi
Prefacep. xv
1 Boolean retrievalp. 1
1.1 An example information retrieval problemp. 3
1.2 A first take at building an inverted indexp. 6
1.3 Processing Boolean queriesp. 9
1.4 The extended Boolean model versus ranked retrievalp. 13
1.5 References and further readingp. 16
2 The term vocabulary and postings listsp. 18
2.1 Document delineation and character sequence decodingp. 18
2.2 Determining the vocabulary of termsp. 21
2.3 Faster postings list intersection via skip pointersp. 33
2.4 Positional postings and phrase queriesp. 36
2.5 References and further readingp. 43
3 Dictionaries and tolerant retrievalp. 45
3.1 Search structures for dictionariesp. 45
3.2 Wildcard queriesp. 48
3.3 Spelling correctionp. 52
3.4 Phonetic correctionp. 58
3.5 References and further readingp. 59
4 Index constructionp. 61
4.1 Hardware basicsp. 62
4.2 Blocked sort-based indexingp. 63
4.3 Single-pass in-memory indexingp. 66
4.4 Distributed indexingp. 68
4.5 Dynamic indexingp. 71
4.6 Other types of indexesp. 73
4.7 References and further readingp. 76
5 Index compressionp. 78
5.1 Statistical properties of terms in information retrievalp. 79
5.2 Dictionary compressionp. 82
5.3 Postings file compressionp. 87
5.4 References and further readingp. 97
6 Scoring, term weighting, and the vector space modelp. 100
6.1 Parametric and zone indexesp. 101
6.2 Term frequency and weightingp. 107
6.3 The vector space model for scoringp. 110
6.4 Variant tf-idf functionsp. 116
6.5 References and further readingp. 122
7 Computing scores in a complete search systemp. 124
7.1 Efficient scoring and rankingp. 124
7.2 Components of an information retrieval systemp. 132
7.3 Vector space scoring and query operator interactionp. 136
7.4 References and further readingp. 137
8 Evaluation in information retrievalp. 139
8.1 Information retrieval system evaluationp. 140
8.2 Standard test collectionsp. 141
8.3 Evaluation of unranked retrieval setsp. 142
8.4 Evaluation of ranked retrieval resultsp. 145
8.5 Assessing relevancep. 151
8.6 A broader perspective: System quality and user utilityp. 154
8.7 Results snippetsp. 157
8.8 References and further readingp. 159
9 Relevance feedback and query expansionp. 162
9.1 Relevance feedback and pseudo relevance feedbackp. 163
9.2 Global methods for query reformulationp. 173
9.3 References and further readingp. 177
10 XML retrievalp. 178
10.1 Basic XML conceptsp. 180
10.2 Challenges in XML retrievalp. 183
10.3 A vector space model for XML retrievalp. 188
10.4 Evaluation of XML retrievalp. 192
10.5 Text-centric versus data-centric XML retrievalp. 196
10.6 References and further readingp. 198
11 Probabilistic information retrievalp. 201
11.1 Review of basic probability theoryp. 202
11.2 The probability ranking principlep. 203
11.3 The binary independence modelp. 204
11.4 An appraisal and some extensionsp. 212
11.5 References and further readingp. 216
12 Language models for information retrievalp. 218
12.1 Language modelsp. 218
12.2 The query likelihood modelp. 223
12.3 Language modeling versus other approaches in information retrievalp. 229
12.4 Extended language modeling approachesp. 230
12.5 References and further readingp. 232
13 Text classification and Naive Bayesp. 234
13.1 The text classification problemp. 237
13.2 Naive Bayes text classificationp. 238
13.3 The Bernoulli modelp. 243
13.4 Properties of Naive Bayesp. 245
13.5 Feature selectionp. 251
13.6 Evaluation of text classificationp. 258
13.7 References and further readingp. 264
14 Vector space classificationp. 266
14.1 Document representations and measures of relatedness in vector spacesp. 267
14.2 Rocchio classificationp. 269
14.3 k nearest neighborp. 273
14.4 Linear versus nonlinear classifiersp. 277
14.5 Classification with more than two classesp. 281
14.6 The bias-variance tradeoffp. 284
14.7 References and further readingp. 291
15 Support vector machines and machine learning on documentsp. 293
15.1 Support vector machines: The linearly separable casep. 294
15.2 Extensions to the support vector machine modelp. 300
15.3 Issues in the classification of text documentsp. 307
15.4 Machine-learning methods in ad hoc information retrievalp. 314
15.5 References and further readingp. 318
16 Flat clusteringp. 321
16.1 Clustering in information retrievalp. 322
16.2 Problem statementp. 326
16.3 Evaluation of clusteringp. 327
16.4 K-meansp. 331
16.5 Model-based clusteringp. 338
16.6 References and further readingp. 343
17 Hierarchical clusteringp. 346
17.1 Hierarchical agglomerative clusteringp. 347
17.2 Single-link and complete-link clusteringp. 350
17.3 Group-average agglomerative clusteringp. 356
17.4 Centroid clusteringp. 358
17.5 Optimality of hierarchical agglomerative clusteringp. 360
17.6 Divisive clusteringp. 362
17.7 Cluster labelingp. 363
17.8 Implementation notesp. 365
17.9 References and further readingp. 367
18 Matrix decompositions and latent semantic indexingp. 369
18.1 Linear algebra reviewp. 369
18.2 Term-document matrices and singular value decompositionsp. 373
18.3 Low-rank approximationsp. 376
18.4 Latent semantic indexingp. 378
18.5 References and further readingp. 383
19 Web search basicsp. 385
19.1 Background and historyp. 385
19.2 Web characteristicsp. 387
19.3 Advertising as the economic modelp. 392
19.4 The search user experiencep. 395
19.5 Index size and estimationp. 396
19.6 Near-duplicates and shinglingp. 400
19.7 References and further readingp. 404
20 Web crawling and indexesp. 405
20.1 Overviewp. 405
20.2 Crawlingp. 406
20.3 Distributing indexesp. 415
20.4 Connectivity serversp. 416
20.5 References and further readingp. 419
21 Link analysisp. 421
21.1 The Web as a graphp. 422
21.2 PageRankp. 424
21.3 Hubs and authoritiesp. 433
21.4 References and further readingp. 439
Bibliographyp. 441
Indexp. 469