Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010151109 | TK5105.885.G66 L36 2006 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more.
The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.
The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.
Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.
Many illustrative examples and entertaining asides
MATLAB code
Accessible and informal style
Complete and self-contained section for mathematics review
Author Notes
Amy N. Langville is Assistant Professor of Mathematics at the College of Charleston in Charleston, South Carolina. She studies mathematical algorithms for information retrieval and text and data mining applications. Carl D. Meyer is Professor of Mathematics at North Carolina State University. In addition to information retrieval, his research areas include numerical analysis, linear algebra, and Markov chains. He is the author of Matrix Analysis and Applied Linear Algebra .
Reviews 1
Library Journal Review
Langville (math, Coll. of Charleston, SC) and Meyer (math, North Carolina State Univ.; Matrix Analysis and Applied Linear Algebra) examine the logic, mathematics, and sophistication behind Google's PageRank and other Internet search engine ranking programs. The first five chapters, aimed at a general audience, are written clearly, concisely, and masterfully and are complemented with figures, asides, and interesting summaries. They successfully introduce such web-searching principles as crawling, linking, and indexing together with basic algebraic ideas. The remaining ten chapters quickly descend into a series of discussions of technical computational and applied mathematical concepts. These latter chapters, whose text includes figures, references, Matlab computer coding, theorems, and mathematical proofs, explore PageRank programming properties based on well-established algebraic, matrix, and computational algorithms. This title should not be confused with books on the business of search engines, the history of the Internet, or how to effectively search the web. It is an excellent work complete with an index, an exhaustive bibliography, and a glossary that cries out to have been written in an online format, with hypertext links, programmable images, mathematical proofs, and a toolbox modeling key algebraic concepts. Suitable for large public and essential for all academic libraries. Ian D. Gordon, Brock Univ. Lib., St. Catharines, Ont. (c) Copyright 2010. Library Journals LLC, a wholly owned subsidiary of Media Source, Inc. No redistribution permitted.
Table of Contents
Preface ix | |
Chapter 1 Introduction to Web Search Engines | |
1 1.1 A Short History of Information Retrieval | |
1 1.2 An Overview of Traditional Information Retrieval | |
5 1.3 Web Information Retrieval | p. 9 |
Chapter 2 Crawling, Indexing, and Query Processing | |
15 2.1 Crawling | |
15 2.1 x The Content Index | |
19 2.3 Query Processing | p. 21 |
Chapter 3 Ranking Webpages by Popularity | |
25 3.1 The Scene in 1998 | |
25 3.2 Two Theses | |
26 3.3 Query-Independence | p. 30 |
Chapter 4 The Mathematics of Google's PageRank | p. 31 |
4.1 The Original Summation Formula for PageRank | p. 32 |
4.2 Matrix Representation of the Summation Equations | p. 33 |
4.3 Problems with the Iterative Process | p. 34 |
4.4 A Little Markov Chain Theory | p. 36 |
4.5 Early Adjustments to the Basic Model | p. 36 |
4.6 Computation of the PageRank Vector | p. 39 |
4.7 Theorem and Proof for Spectrum of the Google Matrix | p. 45 |
Chapter 5 Parameters in the PageRank Model | p. 47 |
5.1 The alpha; Factor | p. 47 |
5.2 The Hyperlink Matrix H | p. 48 |
5.3 The Teleportation Matrix E | p. 49 |
Chapter 6 The Sensitivity of PageRank | p. 57 |
6.1 Sensitivity with respect to alpha; | p. 57 |
6.2 Sensitivity with respect to H | p. 62 |
6.3 Sensitivity with respect to v T | p. 63 |
6.4 Other Analyses of Sensitivity | p. 63 |
6.5 Sensitivity Theorems and Proofs | p. 66 |
Chapter 7 The PageRank Problem as a Linear System | p. 71 |
7.1 Properties of (I -- alhpa;S) | p. 71 |
7.2 Properties of (I -- alpha;H) | p. 72 |
7.3 Proof of the PageRank Sparse Linear System | p. 73 |
Chapter 8 Issues in Large-Scale Implementation of PageRank | p. 75 |
8.1 Storage Issues | p. 75 |
8.2 Convergence Criterion | p. 79 |
8.3 Accuracy | p. 79 |
8.4 Dangling Nodes | p. 80 |
8.5 Back Button Modeling | p. 84 |
Chapter 9 Accelerating the Computation of PageRank | p. 89 |
9.1 An Adaptive Power Method | p. 89 |
9.2 Extrapolation | p. 90 |
9.3 Aggregation | p. 94 |
9.4 Other Numerical Methods | p. 97 |
Chapter 10 Updating the PageRank Vector | p. 99 |
10.1 The Two Updating Problems and their History | p. 100 |
10.2 Restarting the Power Method | p. 101 |
10.3 Approximate Updating Using Approximate Aggregation | p. 102 |
10.4 Exact Aggregation | p. 104 |
10.5 Exact vs. Approximate Aggregation | p. 105 |
10.6 Updating with Iterative Aggregation | p. 107 |
10.7 Determining the Partition | p. 109 |
10.8 Conclusions | p. 111 |
Chapter 11 The HITS Method for Ranking Webpages 115 | |
11.1 The HITS Algorithm | p. 115 |
11.2 HITS Implementation | p. 117 |
11.3 HITS Convergence | p. 119 |
11.4 HITS Example | p. 120 |
11.5 Strengths and Weaknesses of HITS | p. 122 |
11.6 HITS's Relationship to Bibliometrics | p. 123 |
11.7 Query-Independent HITS | p. 124 |
11.8 Accelerating HITS | p. 126 |
11.9 HITS Sensitivity | p. 126 |
Chapter 12 Other Link Methods for Ranking Webpages | p. 131 |
12.1 SALSA | p. 131 |
12.2 Hybrid Ranking Methods | p. 135 |
12.3 Rankings based on Traffic Flow | p. 136 |
Chapter 13 The Future of Web Information Retrieval | p. 139 |
3.1 Spam | p. 139 |
3.2 Personalization | p. 142 |
3.3 Clustering | p. 142 |
3.4 Intelligent Agents | p. 143 |
3.5 Trends and Time-Sensitive Search | p. 144 |
3.6 Privacy and Censorship | p. 146 |
3.7 Library Classification Schemes | p. 147 |
3.8 Data Fusion | p. 148 |
Chapter 14 Resources for Web Information Retrieval | p. 149 |
14.1 Resources for Getting Started | p. 149 |
14.2 Resources for Serious Study | p. 150 |
Chapter 15 The Mathematics Guide | p. 153 |
15.1 Linear Algebra | p. 153 |
15.2 Perron-Frobenius Theory | p. 167 |
15.3 Markov Chains | p. 175 |
15.4 Perron Complementation | p. 186 |
15.5 Stochastic Complementation | p. 192 |
15.6 Censoring | p. 194 |
15.7 Aggregation | p. 195 |
15.8 Disaggregation | p. 198 |
Chapter 16 Chapter 16: Glossary | p. 201 |
Bibliography | p. 207 |
Index | p. 219 |