Skip to:Content
|
Bottom
Cover image for Googles PageRank and beyond : the science of search engine rankings
Title:
Googles PageRank and beyond : the science of search engine rankings
Personal Author:
Publication Information:
Princeton, NJ : Princeton University Press, 2006
ISBN:
9780691122021

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010151109 TK5105.885.G66 L36 2006 Open Access Book Book
Searching...

On Order

Summary

Summary

Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more.


The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.


The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.


Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.


Many illustrative examples and entertaining asides
MATLAB code
Accessible and informal style
Complete and self-contained section for mathematics review


Author Notes

Amy N. Langville is Assistant Professor of Mathematics at the College of Charleston in Charleston, South Carolina. She studies mathematical algorithms for information retrieval and text and data mining applications. Carl D. Meyer is Professor of Mathematics at North Carolina State University. In addition to information retrieval, his research areas include numerical analysis, linear algebra, and Markov chains. He is the author of Matrix Analysis and Applied Linear Algebra .


Reviews 1

Library Journal Review

Langville (math, Coll. of Charleston, SC) and Meyer (math, North Carolina State Univ.; Matrix Analysis and Applied Linear Algebra) examine the logic, mathematics, and sophistication behind Google's PageRank and other Internet search engine ranking programs. The first five chapters, aimed at a general audience, are written clearly, concisely, and masterfully and are complemented with figures, asides, and interesting summaries. They successfully introduce such web-searching principles as crawling, linking, and indexing together with basic algebraic ideas. The remaining ten chapters quickly descend into a series of discussions of technical computational and applied mathematical concepts. These latter chapters, whose text includes figures, references, Matlab computer coding, theorems, and mathematical proofs, explore PageRank programming properties based on well-established algebraic, matrix, and computational algorithms. This title should not be confused with books on the business of search engines, the history of the Internet, or how to effectively search the web. It is an excellent work complete with an index, an exhaustive bibliography, and a glossary that cries out to have been written in an online format, with hypertext links, programmable images, mathematical proofs, and a toolbox modeling key algebraic concepts. Suitable for large public and essential for all academic libraries. Ian D. Gordon, Brock Univ. Lib., St. Catharines, Ont. (c) Copyright 2010. Library Journals LLC, a wholly owned subsidiary of Media Source, Inc. No redistribution permitted.


Table of Contents

Preface ix
Chapter 1 Introduction to Web Search Engines
1 1.1 A Short History of Information Retrieval
1 1.2 An Overview of Traditional Information Retrieval
5 1.3 Web Information Retrievalp. 9
Chapter 2 Crawling, Indexing, and Query Processing
15 2.1 Crawling
15 2.1 x The Content Index
19 2.3 Query Processingp. 21
Chapter 3 Ranking Webpages by Popularity
25 3.1 The Scene in 1998
25 3.2 Two Theses
26 3.3 Query-Independencep. 30
Chapter 4 The Mathematics of Google's PageRankp. 31
4.1 The Original Summation Formula for PageRankp. 32
4.2 Matrix Representation of the Summation Equationsp. 33
4.3 Problems with the Iterative Processp. 34
4.4 A Little Markov Chain Theoryp. 36
4.5 Early Adjustments to the Basic Modelp. 36
4.6 Computation of the PageRank Vectorp. 39
4.7 Theorem and Proof for Spectrum of the Google Matrixp. 45
Chapter 5 Parameters in the PageRank Modelp. 47
5.1 The alpha; Factorp. 47
5.2 The Hyperlink Matrix Hp. 48
5.3 The Teleportation Matrix Ep. 49
Chapter 6 The Sensitivity of PageRankp. 57
6.1 Sensitivity with respect to alpha;p. 57
6.2 Sensitivity with respect to Hp. 62
6.3 Sensitivity with respect to v Tp. 63
6.4 Other Analyses of Sensitivityp. 63
6.5 Sensitivity Theorems and Proofsp. 66
Chapter 7 The PageRank Problem as a Linear Systemp. 71
7.1 Properties of (I -- alhpa;S)p. 71
7.2 Properties of (I -- alpha;H)p. 72
7.3 Proof of the PageRank Sparse Linear Systemp. 73
Chapter 8 Issues in Large-Scale Implementation of PageRankp. 75
8.1 Storage Issuesp. 75
8.2 Convergence Criterionp. 79
8.3 Accuracyp. 79
8.4 Dangling Nodesp. 80
8.5 Back Button Modelingp. 84
Chapter 9 Accelerating the Computation of PageRankp. 89
9.1 An Adaptive Power Methodp. 89
9.2 Extrapolationp. 90
9.3 Aggregationp. 94
9.4 Other Numerical Methodsp. 97
Chapter 10 Updating the PageRank Vectorp. 99
10.1 The Two Updating Problems and their Historyp. 100
10.2 Restarting the Power Methodp. 101
10.3 Approximate Updating Using Approximate Aggregationp. 102
10.4 Exact Aggregationp. 104
10.5 Exact vs. Approximate Aggregationp. 105
10.6 Updating with Iterative Aggregationp. 107
10.7 Determining the Partitionp. 109
10.8 Conclusionsp. 111
Chapter 11 The HITS Method for Ranking Webpages 115
11.1 The HITS Algorithmp. 115
11.2 HITS Implementationp. 117
11.3 HITS Convergencep. 119
11.4 HITS Examplep. 120
11.5 Strengths and Weaknesses of HITSp. 122
11.6 HITS's Relationship to Bibliometricsp. 123
11.7 Query-Independent HITSp. 124
11.8 Accelerating HITSp. 126
11.9 HITS Sensitivityp. 126
Chapter 12 Other Link Methods for Ranking Webpagesp. 131
12.1 SALSAp. 131
12.2 Hybrid Ranking Methodsp. 135
12.3 Rankings based on Traffic Flowp. 136
Chapter 13 The Future of Web Information Retrievalp. 139
3.1 Spamp. 139
3.2 Personalizationp. 142
3.3 Clusteringp. 142
3.4 Intelligent Agentsp. 143
3.5 Trends and Time-Sensitive Searchp. 144
3.6 Privacy and Censorshipp. 146
3.7 Library Classification Schemesp. 147
3.8 Data Fusionp. 148
Chapter 14 Resources for Web Information Retrievalp. 149
14.1 Resources for Getting Startedp. 149
14.2 Resources for Serious Studyp. 150
Chapter 15 The Mathematics Guidep. 153
15.1 Linear Algebrap. 153
15.2 Perron-Frobenius Theoryp. 167
15.3 Markov Chainsp. 175
15.4 Perron Complementationp. 186
15.5 Stochastic Complementationp. 192
15.6 Censoringp. 194
15.7 Aggregationp. 195
15.8 Disaggregationp. 198
Chapter 16 Chapter 16: Glossaryp. 201
Bibliographyp. 207
Indexp. 219
Go to:Top of Page