Cover image for Nonparametric analysis of univariate heavy-tailed data : research and practice
Title:
Nonparametric analysis of univariate heavy-tailed data : research and practice
Personal Author:
Series:
Wiley series in probability and statistics
Publication Information:
West Sussex, England : John Wiley & Sons, 2007
Physical Description:
xxi, 310 p. : ill. ; 24 cm.
ISBN:
9780470510872

Available:*

Library
Item Barcode
Call Number
Material Type
Status
Searching...
30000010185823 QA278.8 M37 2007 Open Access Book
Searching...

On Order

Summary

Summary

Heavy-tailed distributions are typical for phenomena in complexmulti-component systems such as biometry, economics, ecologicalsystems, sociology, web access statistics, internet traffic,biblio-metrics, finance and business. The analysis of suchdistributions requires special methods of estimation due to theirspecific features. These are not only the slow decay to zero of thetail, but also the violation of Cramer?s condition, possiblenon-existence of some moments, and sparse observations in the tailof the distribution.

The book focuses on the methods of statistical analysis ofheavy-tailed independent identically distributed random variablesby empirical samples of moderate sizes. It provides a detailedsurvey of classical results and recent developments in the theoryof nonparametric estimation of the probability density function,the tail index, the hazard rate and the renewal function.

Both asymptotical results, for example convergence rates of theestimates, and results for the samples of moderate sizes supportedby Monte-Carlo investigation, are considered. The text isillustrated by the application of the considered methodologies toreal data of web traffic measurements.


Author Notes

Natalia Markovich ? Institute of Control Sciences,Russian Academy of Sciences, Moscow

Having been the Leading Scientist at the Institute of ControlSciences for the last eleven years, Dr Markovich has had muchexperience in this area. An extremely active member of thestatistical community, she has presented many seminars and invitedtalks, as well as being involved in numerous international researchprojects. She has published over 50 articles and has writtenchapters in two books, for Springer-Verlag and Elsevier.


Table of Contents

Prefacep. xi
1 Definitions and rough detection of tail heavinessp. 1
1.1 Definitions and basic properties of classes of heavy-tailed distributionsp. 1
1.2 Tail index estimationp. 6
1.2.1 Estimators of a positive-valued tail indexp. 6
1.2.2 The choice of k in Hill's estimatorp. 8
1.2.3 Estimators of a real-valued tail indexp. 13
1.2.4 On-line estimation of the tail indexp. 17
1.3 Detection of tail heaviness and dependencep. 27
1.3.1 Rough tests of tail heavinessp. 27
1.3.2 Analysis of Web traffic and TCP flow datap. 30
1.3.3 Dependence detection from univariate datap. 42
1.3.4 Dependence detection from bivariate datap. 49
1.3.5 Bivariate analysis of TCP flow datap. 51
1.4 Notes and commentsp. 56
1.5 Exercisesp. 57
2 Classical methods of probability density estimationp. 61
2.1 Principles of density estimationp. 61
2.2 Methods of density estimationp. 70
2.2.1 Kernel estimatorsp. 70
2.2.2 Projection estimatorsp. 74
2.2.3 Spline estimatorsp. 76
2.2.4 Smoothing methodsp. 76
2.2.5 Illustrative examplesp. 83
2.3 Kernel estimation from dependent datap. 85
2.3.1 Statement of the problemp. 86
2.3.2 Numerical calculation of the bandwidthp. 89
2.3.3 Data-driven selection of the bandwidthp. 91
2.4 Applicationsp. 91
2.4.1 Finance: evaluation of market riskp. 91
2.4.2 Telecommunicationsp. 93
2.4.3 Population analysisp. 94
2.5 Exercisesp. 95
3 Heavy-tailed density estimationp. 99
3.1 Problems of the estimation of heavy-tailed densitiesp. 100
3.2 Combined parametric-nonparametric methodp. 101
3.2.1 Nonparametric estimation of the density by structural risk minimizationp. 103
3.2.2 Illustrative examplesp. 107
3.2.3 Web data analysis by a combined parametric-nonparametric methodp. 109
3.3 Barron's estimator and [chi][superscript 2]-optimalityp. 111
3.4 Kernel estimators with variable bandwidthp. 113
3.5 Retransformed nonparametric estimatorsp. 117
3.6 Exercisesp. 119
4 Transformations and heavy-tailed density estimationp. 123
4.1 Problems of data transformationsp. 123
4.2 Estimates based on a fixed transformationp. 124
4.3 Estimates based on an adaptive transformationp. 128
4.3.1 Estimation algorithmp. 128
4.3.2 Analysis of the algorithmp. 129
4.3.3 Further remarksp. 133
4.4 Estimating the accuracy of retransformed estimatesp. 135
4.5 Boundary kernelsp. 136
4.6 Accuracy of a nonvariable bandwidth kernel estimatorp. 139
4.7 The D method for a nonvariable bandwidth kernel estimatorp. 141
4.8 The D method for a variable bandwidth kernel estimatorp. 142
4.8.1 Method and resultsp. 142
4.8.2 Application to Web traffic characteristicsp. 144
4.9 The [omega][superscript 2] method for the projection estimatorp. 147
4.10 Exercisesp. 149
5 Classification and retransformed density estimatesp. 151
5.1 Classification and quality of density estimationp. 151
5.2 Convergence of the estimated probability of misclassificationp. 154
5.3 Simulation studyp. 155
5.4 Application of the classification technique to Web data analysisp. 160
5.4.1 Intelligent browserp. 160
5.4.2 Web data analysis by traffic classificationp. 161
5.4.3 Web prefetchingp. 161
5.5 Exercisesp. 161
6 Estimation of high quantilesp. 163
6.1 Introductionp. 163
6.2 Estimators of high quantilesp. 164
6.3 Distribution of high quantile estimatesp. 167
6.4 Simulation studyp. 169
6.4.1 Comparison of high quantile estimates in terms of relative bias and mean squared errorp. 169
6.4.2 Comparison of high quantile estimates in terms of confidence intervalsp. 170
6.5 Application to Web traffic datap. 175
6.6 Exercisesp. 176
7 Nonparametric estimation of the hazard rate functionp. 179
7.1 Definition of the hazard rate functionp. 180
7.2 Statistical regularization methodp. 182
7.3 Numerical solution of ill-posed problemsp. 185
7.4 Estimation of the hazard rate function of heavy-tailed distributionsp. 187
7.5 Hazard rate estimation for compactly supported distributionsp. 188
7.5.1 Estimation of the hazard rate from the simplest equationsp. 188
7.5.2 Estimation of the hazard rate from a special kernel equationp. 193
7.6 Estimation of the ratio of hazard ratesp. 197
7.6.1 Failure time detectionp. 199
7.6.2 Hormesis detectionp. 200
7.7 Hazard rate estimation in teletraffic theoryp. 207
7.7.1 Teletraffic processes at the packet levelp. 207
7.7.2 Estimation of the intensity of a nonhomogeneous Poisson processp. 208
7.8 Semi-Markov modeling in teletraffic engineeringp. 210
7.8.1 The Gilbert-Elliott modelp. 210
7.8.2 Estimation of a retrial processp. 212
7.9 Exercisesp. 217
8 Nonparametric estimation of the renewal functionp. 219
8.1 Traffic modeling by recurrent marked point processesp. 220
8.2 Introduction to renewal function estimationp. 221
8.3 Histogram-type estimator of the renewal functionp. 224
8.4 Convergence of the histogram-type estimatorp. 225
8.5 Selection of k by a bootstrap methodp. 228
8.6 Selection of k by a plotp. 232
8.7 Simulation studyp. 234
8.8 Application to the inter-arrival times of TCP connectionsp. 245
8.9 Conclusions and discussionp. 247
8.10 Exercisesp. 248
Appendices
A Proofs of Chapter 2p. 251
B Proofs of Chapter 4p. 253
C Proofs of Chapter 5p. 267
D Proofs of Chapter 6p. 271
E Proofs of Chapter 7p. 275
F Proofs of Chapter 8p. 285
List of Main Symbols and Abbreviationsp. 291
Referencesp. 295
Indexp. 307