Cover image for The statistical evaluation of medical tests for classification and prediction
Title:
The statistical evaluation of medical tests for classification and prediction
Series:
Oxford statistical science series
Publication Information:
Oxford : Oxford University Press, 2003
ISBN:
9780198509844

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000004990663 RB38.3 P46 2003 Open Access Book Book
Searching...

On Order

Summary

Summary

This book describes statistical techniques for the design and evaluation of research studies on medical diagnostic tests, screening tests, biomarkers and new technologies for classification and prediction in medicine. Based on solid mathematical theory, the book includes worked examples along with data and code, which provides the reader with easy implementation of methods.


Author Notes

Margaret Sullivan Pepe is a Professor of Biostatistics, University of Washington and Fred Hutchinson Cancer Research Center, Washington, USA.


Table of Contents

Notationp. xv
1 Introductionp. 1
1.1 The medical testp. 1
1.1.1 Tests, classification and the broader contextp. 1
1.1.2 Disease screening versus diagnosisp. 2
1.1.3 Criteria for a useful medical testp. 2
1.2 Elements of study designp. 3
1.2.1 Scale for the test resultp. 4
1.2.2 Selection of study subjectsp. 4
1.2.3 Comparing testsp. 5
1.2.4 Test integrityp. 5
1.2.5 Sources of biasp. 6
1.3 Examples and datasetsp. 8
1.3.1 Overviewp. 8
1.3.2 The CASS datasetp. 8
1.3.3 Pancreatic cancer serum biomarkers studyp. 10
1.3.4 Hepatitis metastasis ultrasound studyp. 10
1.3.5 CARET PSA biomarker studyp. 10
1.3.6 Ovarian cancer gene expression studyp. 11
1.3.7 Neonatal audiology datap. 11
1.3.8 St Louis prostate cancer screening studyp. 11
1.4 Topics and organizationp. 11
1.5 Exercisesp. 12
2 Measures of accuracy for binary testsp. 14
2.1 Measures of accuracyp. 14
2.1.1 Notationp. 14
2.1.2 Disease-specific classification probabilitiesp. 14
2.1.3 Predictive valuesp. 16
2.1.4 Diagnostic likelihood ratiosp. 17
2.2 Estimating accuracy with datap. 21
2.2.1 Data from a cohort studyp. 21
2.2.2 Proportions: (FPF, TPF) and (PPV, NPV)p. 22
2.2.3 Ratios of proportions: DLRsp. 24
2.2.4 Estimation from a case-control studyp. 25
2.2.5 Merits of case-control versus cohort studiesp. 26
2.3 Quantifying the relative accuracy of testsp. 27
2.3.1 Comparing classification probabilitiesp. 28
2.3.2 Comparing predictive valuesp. 29
2.3.3 Comparing diagnostic likelihood ratiosp. 30
2.3.4 Which test is better?p. 31
2.4 Concluding remarksp. 33
2.5 Exercisesp. 34
3 Comparing binary tests and regression analysisp. 35
3.1 Study designs for comparing testsp. 35
3.1.1 Unpaired designsp. 35
3.1.2 Paired designsp. 36
3.2 Comparing accuracy with unpaired datap. 37
3.2.1 Empirical estimators of comparative measuresp. 37
3.2.2 Large sample inferencep. 38
3.3 Comparing accuracy with paired datap. 41
3.3.1 Sources of correlationp. 41
3.3.2 Estimation of comparative measuresp. 41
3.3.3 Wide or long data representationsp. 42
3.3.4 Large sample inferencep. 43
3.3.5 Efficiency of paired versus unpaired designsp. 44
3.3.6 Small sample propertiesp. 45
3.3.7 The CASS studyp. 45
3.4 The regression modeling frameworkp. 48
3.4.1 Factors potentially affecting test performancep. 48
3.4.2 Questions addressed by regression modelingp. 50
3.4.3 Notation and general set-upp. 50
3.5 Regression for true and false positive fractionsp. 51
3.5.1 Binary marginal GLM modelsp. 51
3.5.2 Fitting marginal models to datap. 51
3.5.3 Illustration: factors affecting test accuracyp. 53
3.5.4 Comparing tests with regression analysisp. 55
3.6 Regression modeling of predictive valuesp. 58
3.6.1 Model formulation and fittingp. 58
3.6.2 Comparing testsp. 59
3.6.3 The incremental value of a test for predictionp. 59
3.7 Regression models for DLRsp. 61
3.7.1 The model formp. 61
3.7.2 Fitting the DLR modelp. 61
3.7.3 Comparing DLRs of two testsp. 61
3.7.4 Relationships with other regression modelsp. 62
3.8 Concluding remarksp. 63
3.9 Exercisesp. 64
4 The receiver operating characteristic curvep. 66
4.1 The contextp. 66
4.1.1 Examples of non-binary testsp. 66
4.1.2 Dichotomizing the test resultp. 66
4.2 The ROC curve for continuous testsp. 67
4.2.1 Definition of the ROCp. 67
4.2.2 Mathematical properties of the ROC curvep. 68
4.2.3 Attributes of and uses for the ROC curvep. 71
4.2.4 Restrictions and alternatives to the ROC curvep. 75
4.3 Summary indicesp. 76
4.3.1 The area under the ROC curve (AUC)p. 77
4.3.2 The ROC(t[subscript 0]) and partial AUCp. 79
4.3.3 Other summary indicesp. 80
4.3.4 Measures of distance between distributionsp. 81
4.4 The binormal ROC curvep. 81
4.4.1 Functional formp. 82
4.4.2 The binormal AUCp. 83
4.4.3 The binormal assumptionp. 84
4.5 The ROC for ordinal testsp. 85
4.5.1 Tests with ordered discrete resultsp. 85
4.5.2 The latent decision variable modelp. 86
4.5.3 Identification of the latent variable ROCp. 86
4.5.4 Changes in accuracy versus thresholdsp. 88
4.5.5 The discrete ROC curvep. 89
4.5.6 Summary measures for the discrete ROC curvep. 92
4.6 Concluding remarksp. 92
4.7 Exercisesp. 94
5 Estimating the ROC curvep. 96
5.1 Introductionp. 96
5.1.1 Approachesp. 96
5.1.2 Notation and assumptionsp. 96
5.2 Empirical estimationp. 97
5.2.1 The empirical estimatorp. 97
5.2.2 Sampling variability at a thresholdp. 99
5.2.3 Sampling variability of ROC[subscript e](t)p. 99
5.2.4 The empirical AUC and other indicesp. 103
5.2.5 Variability in the empirical AUCp. 104
5.2.6 Comparing empirical ROC curvesp. 107
5.2.7 Illustration: pancreatic cancer biomarkersp. 109
5.2.8 Discrete ordinal data ROC curvesp. 110
5.3 Modeling the test result distributionsp. 111
5.3.1 Fully parametric modelingp. 111
5.3.2 Semiparametric location-scale modelsp. 112
5.3.3 Arguments against modeling test resultsp. 114
5.4 Parametric distribution-free methods: ordinal testsp. 114
5.4.1 The binormal latent variable frameworkp. 115
5.4.2 Fitting the discrete binormal ROC functionp. 117
5.4.3 Generalizations and comparisonsp. 118
5.5 Parametric distribution-free methods: continuous testsp. 119
5.5.1 LABROCp. 119
5.5.2 The ROC-GLM estimatorp. 120
5.5.3 Inference with parametric distribution-free methodsp. 124
5.6 Concluding remarksp. 125
5.7 Exercisesp. 127
5.8 Proofs of theoretical resultsp. 128
6 Covariate effects on continuous and ordinal testsp. 130
6.1 How and why?p. 130
6.1.1 Notationp. 130
6.1.2 Aspects to modelp. 131
6.1.3 Omitting covariates/pooling datap. 132
6.2 Reference distributionsp. 136
6.2.1 Non-diseased as the reference populationp. 136
6.2.2 The homogenous populationp. 137
6.2.3 Nonparametric regression quantilesp. 139
6.2.4 Parametric estimation of S[subscript D,Z]p. 140
6.2.5 Semiparametric modelsp. 141
6.2.6 Applicationp. 141
6.2.7 Ordinal test resultsp. 143
6.3 Modeling covariate effects on test resultsp. 144
6.3.1 The basic ideap. 144
6.3.2 Induced ROC curves for continuous testsp. 144
6.3.3 Semiparametric location-scale familiesp. 148
6.3.4 Induced ROC curves for ordinal testsp. 150
6.3.5 Random effect models for test resultsp. 150
6.4 Modeling covariate effects on ROC curvesp. 151
6.4.1 The ROC-GLM regression modelp. 152
6.4.2 Fitting the model to datap. 154
6.4.3 Comparing ROC curvesp. 157
6.4.4 Three examplesp. 159
6.5 Approaches to ROC regressionp. 164
6.5.1 Modeling ROC summary indicesp. 164
6.5.2 A qualitative comparisonp. 164
6.6 Concluding remarksp. 166
6.7 Exercisesp. 167
7 Incomplete data and imperfect reference testsp. 168
7.1 Verification biased samplingp. 168
7.1.1 Context and definitionp. 168
7.1.2 The missing at random assumptionp. 170
7.1.3 Correcting for bias with Bayes' theoremp. 170
7.1.4 Inverse probability weighting/imputationp. 171
7.1.5 Sampling variability of corrected estimatesp. 172
7.1.6 Adjustments for other biasing factorsp. 175
7.1.7 A broader contextp. 177
7.1.8 Non-binary testsp. 179
7.2 Verification restricted to screen positivesp. 180
7.2.1 Extreme verification biasp. 180
7.2.2 Identificable parameters for a single testp. 181
7.2.3 Comparing testsp. 183
7.2.4 Evaluating covariate effects on (DP, FP)p. 185
7.2.5 Evaluating covariate effects on (TPF, FPF) and on prevalencep. 187
7.2.6 Evaluating covariate effects on (rTPF, rFPF)p. 189
7.2.7 Alternative strategiesp. 193
7.3 Imperfect reference testsp. 194
7.3.1 Examplesp. 194
7.3.2 Effects on accuracy parametersp. 194
7.3.3 Classic latent class analysisp. 197
7.3.4 Relaxing the conditional independence assumptionp. 200
7.3.5 A critique of latent class analysisp. 203
7.3.6 Discrepant resolutionp. 205
7.3.7 Composite reference standardsp. 206
7.4 Concluding remarksp. 207
7.5 Exercisesp. 209
7.6 Proofs of theoretical resultsp. 210
8 Study design and hypothesis testingp. 214
8.1 The phases of medical test developmentp. 214
8.1.1 Research as a processp. 214
8.1.2 Five phases for the development of a medical testp. 215
8.2 Sample sizes for phase 2 studiesp. 218
8.2.1 Retrospective validation of a binary testp. 218
8.2.2 Retrospective validation of a continuous testp. 220
8.2.3 Sample size based on the AUCp. 224
8.2.4 Ordinal testsp. 228
8.3 Sample sizes for phase 3 studiesp. 229
8.3.1 Comparing two binary tests--paired datap. 229
8.3.2 Comparing two binary tests--unpaired datap. 233
8.3.3 Evaluating population effects on test performancep. 233
8.3.4 Comparisons with continuous test resultsp. 234
8.3.5 Estimating the threshold for screen positivityp. 237
8.3.6 Remarks on phase 3 analysesp. 238
8.4 Sample sizes for phase 4 studiesp. 239
8.4.1 Designs for inference about (FPF, TPF)p. 239
8.4.2 Designs for predictive valuesp. 241
8.4.3 Designs for (FP, DP)p. 243
8.4.4 Selected verification of screen negativesp. 244
8.5 Phase 5p. 245
8.6 Matching and stratificationp. 246
8.6.1 Stratificationp. 246
8.6.2 Matchingp. 247
8.7 Concluding remarksp. 248
8.8 Exercisesp. 251
9 More topics and conclusionsp. 253
9.1 Meta-analysisp. 253
9.1.1 Goals of meta-analysisp. 253
9.1.2 Design of a meta-analysis studyp. 253
9.1.3 The summary ROC curvep. 255
9.1.4 Binomial regression modelsp. 258
9.2 Incorporating the time dimensionp. 259
9.2.1 The contextp. 259
9.2.2 Incident cases and long-term controlsp. 260
9.2.3 Interval cases and controlsp. 263
9.2.4 Predictive valuesp. 266
9.2.5 Longitudinal measurementsp. 266
9.3 Combining multiple test resultsp. 267
9.3.1 Boolean combinationsp. 267
9.3.2 The likelihood ratio principlep. 269
9.3.3 Optimality of the risk scorep. 271
9.3.4 Estimating the risk scorep. 274
9.3.5 Development and assessment of the combination scorep. 276
9.4 Concluding remarksp. 277
9.4.1 Topics we only mentionp. 277
9.4.2 New applications and new technologiesp. 277
9.5 Exercisesp. 279
Bibliographyp. 280
Indexp. 297