Unified methods for censored longitudinal data and causality

During the last decades, there has been an explosion in computation and information technology. This development comes with an expansion of complex observational studies and clinical trials in a variety of fields such as medicine, biology, epidemiology, sociology, and economics among many others, which involve collection of large amounts of data on subjects or organisms over time. The goal of such studies can be formulated as estimation of a finite dimensional parameter of the population distribution corresponding to the observed time- dependent process. Such estimation problems arise in survival analysis, causal inference and regression analysis. This book provides a fundamental statistical framework for the analysis of complex longitudinal data. It provides the first comprehensive description of optimal estimation techniques based on time-dependent data structures subject to informative censoring and treatment assignment in so called semiparametric models. Semiparametric models are particularly attractive since they allow the presence of large unmodeled nuisance parameters. These techniques include estimation of regression parameters in the familiar (multivariate) generalized linear regression and multiplicative intensity models. They go beyond standard statistical approaches by incorporating all the observed data to allow for informative censoring, to obtain maximal efficiency, and by developing estimators of causal effects. It can be used to teach masters and Ph.D. students in biostatistics and statistics and is suitable for researchers in statistics with a strong interest in the analysis of complex longitudinal data.

Author Notes

Mark J. van der Laan is Professor of Biostatistics and Statistics at the University of California, Berkeley, and is a prominent researcher in the area of censored data and causality. His methodological research is inspired by collaborations with biologists, epidemiologists, and medical researchers. He has designed courses in survival analysis, censored data, causal inference, and statistical methods in computational biology, and advises a group of Ph.D. students in these fields. He is currently Associate Editor of the statistical journals Biometrics, Journal of Statistical Planning and Infernece, and Statistical Applications in Genetics and Molecular Biology
James M. Robins is the Mitchell L. and Robin LaFoley Dong Professor of Epidemiology and Professor of Biostatistics at the Harvard School of Public Health. Over the past two decades, Professor Robins has developed novel statistical methods for inferring the causal effects of time-varying treatments or exposures from both observational and experimental data, and for appropriately adjusting for missing and censored data in very high-dimensional statistical models

Preface	p. v
Notation	p. 1
1 Introduction	p. 8
1.1 Motivation, Bibliographic History, and an Overview of the book	p. 8
1.2 Tour through the General Estimation Problem	p. 16
1.2.1 Estimation in a high-dimensional full data model	p. 17
1.2.2 The curse of dimensionality in the full data model	p. 21
1.2.3 Coarsening at random	p. 23
1.2.4 The curse of dimensionality revisited	p. 27
1.2.5 The observed data model	p. 40
1.2.6 General method for construction of locally efficient estimators	p. 40
1.2.7 Comparison with maximum likelihood estimation	p. 45
1.3 Example: Causal Effect of Air Pollution on Short-Term Asthma Response	p. 48
1.4 Estimating Functions	p. 55
1.4.1 Orthogonal complement of a nuisance tangent space	p. 55
1.4.2 Review of efficiency theory	p. 61
1.4.3 Estimating functions	p. 62
1.4.4 Orthogonal complement of a nuisance tangent space in an observed data model	p. 64
1.4.5 Basic useful results to compute projections	p. 68
1.5 Robustness of Estimating Functions	p. 69
1.5.1 Robustness of estimating functions against misspecification of linear convex nuisance parameters	p. 69
1.5.2 Double robustness of observed data estimating functions	p. 77
1.5.3 Understanding double robustness for a general semiparametric model	p. 79
1.6 Doubly robust estimation in censored data models	p. 81
1.7 Using Cross-Validation to Select Nuisance Parameter Models	p. 93
1.7.1 A semiparametric model selection criterian	p. 94
1.7.2 Forward/backward selection of a nuisance parameter model based on cross-validation with respect to the parameter of interest	p. 97
1.7.3 Data analysis example: Estimating the causal relationship between boiled water use and diarrhea in HIV-positive men	p. 99
2 General Methodology	p. 102
2.1 The General Model and Overview	p. 102
2.2 Full Data Estimating Functions	p. 103
2.2.1 Orthogonal complement of the nuisance tangent space in the multivariate generalized linear regression model (MGLM)	p. 105
2.2.2 Orthogonal complement of the nuisance tangent space in the multiplicative intensity model	p. 107
2.2.3 Linking the orthogonal complement of the nuisance tangent space to estimating functions	p. 111
2.3 Mapping into Observed Data Estimating Functions	p. 114
2.3.1 Initial mappings and reparametrizing the full data estimating functions	p. 114
2.3.2 Initial mapping indexed by censoring and protected nuisance parameter	p. 124
2.3.3 Extending a mapping for a restricted censoring model to a complete censoring model	p. 125
2.3.4 Inverse weighting a mapping developed for a restricted censoring model	p. 126
2.3.5 Beating a given RAL estimator	p. 128
2.3.6 Orthogonalizing an initial mapping w.r.t. G: Double robustness	p. 131
2.3.7 Ignoring information on the censoring mechanism improves efficiency	p. 135
2.4 Optimal Mapping into Observed Data Estimating Functions	p. 137
2.4.1 The corresponding estimating equation	p. 139
2.4.2 Discussion of ingredients of a one-step estimator	p. 141
2.5 Guaranteed Improvement Relative to an Initial Estimating Function	p. 142
2.6 Construction of Confidence Intervals	p. 144
2.7 Asymptotics of the One-Step Estimator	p. 145
2.7.1 Asymptotics assuming consistent estimation of the censoring mechanism	p. 146
2.7.2 Proof of Theorem 2.4	p. 150
2.7.3 Asymptotics assuming that either the censoring mechanism or the full data distribution is estimated consistently	p. 151
2.7.4 Proof of Theorem 2.5	p. 152
2.8 The Optimal Index	p. 153
2.8.1 Finding the optimal estimating function among a given class of estimating functions	p. 159
2.9 Estimation of the Optimal Index	p. 166
2.9.1 Reparametrizing the representations of the optimal full data function	p. 167
2.9.2 Estimation of the optimal full data structure estimating function	p. 169
2.10 Locally Efficient Estimation with Score-Operator Representation	p. 170
3 Monotone Censored Data	p. 172
3.1 Data Structure and Model	p. 172
3.1.1 Cause-specific censoring	p. 175
3.2 Examples	p. 176
3.2.1 Right-censored data on a survival time	p. 176
3.2.2 Right-censored data on quality-adjusted survival time	p. 177
3.2.3 Right-censored data on a survival time with reporting delay	p. 179
3.2.4 Univariately right-censored multivariate failure time data	p. 181
3.3 Inverse Probability Censoring Weighted (IPCW) Estimators	p. 183
3.3.1 Identifiability condition	p. 183
3.3.2 Estimation of a marginal multiplicative intensity model	p. 184
3.3.3 Extension to proportional rate models	p. 191
3.3.4 Projecting on the tangent space of the Cox proportional hazards model of the censoring mechanism	p. 192
3.4 Optimal Mapping into Estimating Functions	p. 195
3.5 Estimation of Q	p. 196
3.5.1 Regression approach: Assuming that the censoring mechanism is correctly specified	p. 197
3.5.2 Maximum likelihood estimation according to a multiplicative intensity model: Doubly robust	p. 198
3.5.3 Maximum likelihood estimation for discrete models: Doubly robust	p. 200
3.5.4 Regression approach: Doubly robust	p. 201
3.6 Estimation of the Optimal Index	p. 204
3.6.1 The multivariate generalized regression model	p. 205
3.6.2 The multivariate generalized regression model when covariates are always observed	p. 206
3.7 Multivariate failure time regression model	p. 208
3.8 Simulation and data analysis for the nonparametric full data model	p. 211
3.9 Rigorous Analysis of a Bivariate Survival Estimate	p. 217
3.9.1 Proof of Theorem 3.2	p. 221
3.10 Prediction of Survival	p. 224
3.10.1 General methodology	p. 225
3.10.2 Prediction of survival with Regression Trees	p. 230
4 Cross-Sectional Data and Right-Censored Data Combined	p. 232
4.1 Model and General Data Structure	p. 232
4.2 Cause Specific Monitoring Schemes	p. 234
4.2.1 Overview	p. 235
4.3 The Optimal Mapping into Observed Data Estimating Functions	p. 236
4.3.1 Identifiability condition	p. 239
4.3.2 Estimation of a parameter on which we have current status data	p. 241
4.3.3 Estimation of a parameter on which we have right-censored data	p. 243
4.3.4 Estimation of a joint-distribution parameter on which we have current status data and right-censored data	p. 244
4.4 Estimation of the Optimal Index in the MGLM	p. 245
4.5 Example: Current Status Data with Time-Dependent Covariates	p. 246
4.5.1 Regression with current status data	p. 248
4.5.2 Previous work and comparison with our results	p. 250
4.5.3 An initial estimator	p. 251
4.5.4 The locally efficient one-step estimator	p. 252
4.5.5 Implementation issues	p. 253
4.5.6 Construction of confidence intervals	p. 255
4.5.7 A doubly robust estimator	p. 256
4.5.8 Data-adaptive selection of the location parameter	p. 257
4.5.9 Simulations	p. 257
4.5.10 Example 1: No unmodeled covariate	p. 258
4.5.11 Example 2: Unmodeled covariate	p. 258
4.5.12 Data Analysis: California Partners' Study	p. 260
4.6 Example: Current Status Data on a Process Until Death	p. 262
5 Multivariate Right-Censored Multivariate Data	p. 266
5.1 General Data Structure	p. 266
5.1.1 Modeling the censoring mechanism	p. 268
5.1.2 Overview	p. 270
5.2 Mapping into Observed Data Estimating Functions	p. 271
5.2.1 The initial mapping into observed estimating data functions	p. 271
5.2.2 Generalized Dabrowska estimator of the survival function in the nonparametric full data model	p. 273
5.2.3 Simulation study of the generalized Dabrowka estimator	p. 275
5.2.4 The proposed mapping into observed data estimating functions	p. 276
5.2.5 Choosing the full data estimating function in MGLM	p. 282
5.3 Bivariate Right-Censored Failure Time Data	p. 282
5.3.1 Introduction	p. 282
5.3.2 Locally efficient estimation with bivariate right-censored data	p. 286
5.3.3 Implementation of the locally efficient estimator	p. 290
5.3.4 Inversion of the information operator	p. 292
5.3.5 Asymptotic performance and confidence intervals	p. 293
5.3.6 Asymptotics	p. 294
5.3.7 Simulation methods and results for the nonparametric full data model	p. 299
5.3.8 Data analysis: Twin age at appendectomy	p. 302
6 Unified Approach for Causal Inference and Censored Data	p. 311
6.1 General Model and Method of Estimation	p. 311
6.2 Causal Inference with Marginal Structural Models	p. 318
6.2.1 Closed Form Formula for the Inverse of the Nonparametric Information Operator in Causal Inference Models	p. 324
6.3 Double Robustness in Point Treatment MSM	p. 326
6.4 Marginal Structural Model with Right-Censoring	p. 329
6.4.1 Doubly robust estimators in marginal structural models with right-censoring	p. 334
6.4.2 Data Analysis: SPARCS	p. 338
6.4.3 A simulation for estimators of a treatment-specific survival function	p. 343
6.5 Structural Nested Model with Right-Censoring	p. 347
6.5.1 The orthogonal complement of a nuisance tangent space in a structural nested model without censoring	p. 353
6.5.2 A class of estimating functions for the marginal structural nested model	p. 357
6.5.3 Analyzing dynamic treatment regimes	p. 359
6.5.4 Simulation for dynamic regimes in point treatment studies	p. 360
6.6 Right-Censoring with Missingness	p. 362
6.7 Interval Censored Data	p. 366
6.7.1 Interval censoring and right-censoring combined	p. 368
References	p. 371
Author index	p. 388
Subject index	p. 394
Example index	p. 397

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents