# DTPQDT02019019642.pdf

CoxRegressionwithSurvival-Time-DependentMissingCovariateValues by Yanyao Yi A dissertation submitted in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy (Statistics) at the UNIVERSITY OF WISCONSIN–MADISON 2019 Date of ﬁnal oral examination: 04/24/2019 The dissertation is approved by the following members of the Final Oral Committee: Richard Chappell, Professor, Department of Statistics Zhengjun Zhang, Professor, Department of Statistics Lu Mao, Assistant Professor, Department of Biostatistics and Medical Informatics Menggang Yu, Professor, Department of Biostatistics and Medical Informatics Jun Shao, Professor, Department of Statistics ProQuest Number: All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages,these will be noted. Also, if material had to be removed, a note will indicate the deletion. ProQuest Published by ProQuest LLC ( ). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 22583529 22583529 2019c Copyright by Yanyao Yi 2019 All Rights Reservedi Abstract Analysis with time-to-event data in clinical and epidemiological studies often encounters missing covariate values and the missing at random assumption is commonly adopted (e.g., Qi et al., 2005), which assumes that missingness depends on the observed data, including the observed outcome which is the minimum of survival and censoring time. However, it is conceivable that in certain settings, missingness of covariate values is related to the survival time but not to the censoring time (Rathouz, 2007). This is especially so when covariate missingness is related with an unmeasured variable that is affected by survival time but does not causally affect survival. If this is the case, then the covariate missingness is not at random as the survival time is censored, and it creates a challenge in data analysis. In this article, we propose an approach to deal with such survival-time-dependent covariate missingness based on the well known Cox proportional hazard model. Our method is based on inverse propensity weighting with the propensity estimated by nonparametric kernel regression. Our estimators are consistent and asymptotically normal, and their ﬁnite- sample performance is examined through simulation. An application to a real-data example is included for illustration.ii Contents Abstract i 1 Introduction 1 2 Method 7 2.1 Doubly Weighted Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Compositely Weighted Estimator . . . . . . . . . . . . . . . . . . . . . . . 13 3 Simulation 17 3.1 Simulation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Example 25 4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Missingness Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5 Proofs 31iii 5.1 Proof of the fact that (1.5) implies (1.4) . . . . . . . . . . . . . . . . . . . 31 5.2 Proof of the fact that (1.5) and (1.2) imply (1.6) . . . . . . . . . . . . . . . 32 5.3 Proof of the fact that (1.5) implies (1.7) . . . . . . . . . . . . . . . . . . . 32 5.4 Proof of Theorems 2.1.1 and 2.2.1 . . . . . . . . . . . . . . . . . . . . . . 33 5.4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.4.2 Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.4.3 Proof of Theorem 2.1.1 . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4.4 Proof of Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . 45 6 Discussion 50 6.1 Discussion on Weights Estimation . . . . . . . . . . . . . . . . . . . . . . 50 6.1.1 Missingness Propensity for Simulation Setting 2 . . . . . . . . . . 51 6.1.2 Censoring Weight for Simulation Setting 2 . . . . . . . . . . . . . 53 6.2 Discussion on the effect of Kernel Estimatorb 1 . . . . . . . . . . . . . . . 54 6.3 Discussion on Item Missingness of MultivariateX . . . . . . . . . . . . . 56 6.4 Discussion on Time-varying covariates . . . . . . . . . . . . . . . . . . . . 56 References 59iv ListofTables 3.1 Bias, SD, SE, and CP based on 2000 simulation runs for estimation of under setting 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Bias, SD, SE, and CP based on 2000 simulation runs for estimation of under setting 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Bias, SD, SE, and CP based on 2000 simulation runs for estimation of under setting 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Relative Efﬁciency of DWE with respect to CWE (variance ration) . . . . . . . . 23 4.1 Estimates of coefﬁcients of covariates in Cox regression with stage III NSCLC data 29 4.2 Mutual comparison between CC, SWE and CWE for Stage III NSCLC data . . . . 30v ListofFigures 1.1 DAG constructed under assumptions (1.5) and (1.2) . . . . . . . . . . . . . 4 4.1 Decision tree for assumption (1.3) . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Decision tree for assumption (1.4) . . . . . . . . . . . . . . . . . . . . . . 27 6.1 Comparison between estimatedb 1 (T;X) and true 1 (T;X) overT . . . . . 52 6.2 Comparison between estimated b (T;X) and true (T;X) overT . . . . . 54vi Acknowledgments There is no doubt that the ﬁrst person I would like to express my deepest gratitude is my advisor Prof. Jun Shao, for the continuous support of my study and related research, for his patience, motivation, and immense knowledge. It is no exaggeration to say Prof. Shao changed my life and will continuously inﬂuence my future life, in attitudes of both working and living. I could not have imagined having a better advisor. My sincere thanks also goes to Guang, Prof. Shao’s wife, who has always treated my wife and me as family and made us feel Madison as our home. I also would like to thank my co-advisor Prof. Menggang Yu, for his support of research assistantship, which provided me not only ﬁnancial support but also important research experience. Besides my advisors, I would like to thank the rest of my thesis committee: Prof. Richard Chappell, Prof. Lu Mao and Prof. Zhengjun Zhang, for their insightful comments and encouragement, but also for the hard question which incented me to widen my research from various perspectives. In addition, I owe my thanks to Prof. Zhengjun Zhang and Chunming Zhang, for teaching me the ﬁrst statistics courses STAT 609, 610 and 849, which introduced me into the spectacular statistical world and established a solid foundation for my Ph.D study. I am also indebted to Prof. Qiongshi Lv for leading me into genetic statistics at my last year of Ph.D, bringing me to explore this exciting ﬁeld andvii writing papers with me. With a special mention to my best friends met in Madison, Muxuan Liang and Xiuyu Ma, it was fantastic to have the opportunity to spend majority of my PhD life with you. Last but not the least, I would like to thank my parents for their unconditional love and supports and especially my wife Dr. Ting Ye, for her everlasting love, understanding, support and making my life wonderful.1 Chapter1 Introduction Cox regression is one of the most popular methods dealing with censored failure time in survival analysis. For a continuous failure time T and covariate vectorV measured at baseline, in this paper we consider the following Cox proportional hazard model, (tjV ) = 0 (t) exp( V ); (1.1) where (tjV ) is the hazard at time t given V , 0 (t) is an unspeciﬁed baseline hazard function common for all subjects, is a vector of unknown parameters, and is its transpose. In many survival studies, there exits censoring. In this paper, we focus on right censoring, i.e., there is a continuous censoring timeC and what we observe areT^C = min(T;C) and = I fT Cg , the indicator of event T C. A common assumption on censoring is T?CjV; (1.2)2 i.e., T and C are conditionally independent givenV . Based on a random sample from the distribution of (T^C;; V ), can be estimated by maximizing the partial likelihood derived in Cox (1975) under (1.1) and (1.2), and the asymptotic properties of this estimator can be found in Andersen and Gill (1982). In clinical and epidemiological studies some components of the covariate vectorV may have missing values and the partial likelihood cannot be directly applied. LetV = (X;Z) withX being the sub-vector that may have missing values andZ being the sub-vector that is always observed, and letR be the indicator equaling 1 ifX is completely observed and 0 if at least one component ofX is missing. As pointed out in Paik and Tsai (1997), Lipsitz and Ibrahim (1998), and Rathouz (2007), the complete-case analysis with the partial likelihood based only on subjects with R = 1 is valid if R ? (T;C;V ), i.e., missing completely at random (Rubin, 1976), orR? (T;C)jV , i.e., missingness depends only onV , not on outcome (T;C), although missingness may be not at random and methods more efﬁcient than complete-case analysis can be derived, e.g., Lin and Ying (1993) and Cook et al. (2011). However, missingness of covariate values is often believed to be outcome related, either directly or indirectly. In survival studies, some researchers assume that missingness isT^C related, i.e., R? (X;T;C)j (T^C;Z; ) (1.3) (e.g., Lipsitz and Ibrahim, 1998; Chen and Little, 1999; Herring and Ibrahim, 2001; Chen, 2002), which is a type of missing at random assumption (Rubin, 1976) becauseT^C,Z and are all observed. Even ifR is related with (T;C), however, it is hard to imagine why R is related toT^C, a very special function of the outcome (T;C). We speculate that (1.3) is assumed for an easy analysis, since one can simply use methods valid under missing at3 random, e.g., inverse propensity weighting with estimated pr(R = 1j T^C;Z; ) based on always observed (T^C;Z; ) (Wang and Chen, 2001; Qi et al., 2005; Xu et al., 2009). Rathouz (2007) argues that the following survival-time-dependent missingness mecha- nism is more reasonable than assumption (1.3) in many biomedical studies, R? (X;C)j (T;Z): (1.4) One scenario in which (1.4) holds is when there is an unmeasured variableU satisfying R? (T;C;X)j (U;Z) and U? (C;X)j (T;Z): (1.5) The ﬁrst condition in (1.5) means that missingness of X is driven by U together with observedZ, whereas the second condition in (1.5) says thatU is not dependent with (C;X) when (T;Z) is given. It is shown in Section 5.1 and 5.2 respectively that (1.5) implies (1.4), and together with (1.2) implies that f(R;U;T;C;V ) =f(RjU;Z)f(UjT;Z)f(TjV )f(C;V ); (1.6) wheref(j ) orf( ) is used as a generic notation for a conditional or unconditional density. Result (1.6) indicates thatU does not causally affectT , i.e., it does not enter inf(TjV ), but U is affected by T through f(U j T;Z). It is also shown in Section 5.3 that (1.5) implies pr(R = 1jT;Z) =Efpr(R = 1jU;Z)jT;Zg; (1.7) which means T is a surrogate for the unmeasured U in the missingness propensity. The above Directed Acyclic Graph (DAG) constructed under assumptions (1.5) and (1.2) pro-4 Figure 1.1: DAG constructed under assumptions (1.5) and (1.2) vides an illustration of result (1.6). More details about DAG and graphic models can be found in Pearl et al. (2016). An example ofU could be some phenotype of potential sur- vival or a subjective assessment, as the following example indicates. Example 1. In our real data example analyzed in Chapter 4 based on the national cancer database (NCDB) (https://www.facs.org/quality-programs/cancer/ncdb), all patients were diagnosed with stage III Non-Small Cell Lung Cancer (NSCLC) at baseline, but only about 40% patients had more accurate tumor stageX recorded as either stage IIIA or stage IIIB. From the lung cancer staging system provided by the American Joint Committee on Cancer (https://cancerstaging.org/references-tools/quickreferences/Documents/LungMedium.pdf), the main difference between stage IIIA and stage IIIB is the nodal involvement, i.e., stage IIIB has much more extensive metastasis in regional lymph nodes regardless of tumor size. Measuring X is difﬁcult and may require invasive techniques. Recent advanced non-invasive methods such as positron emission tomography or compute tomography do not provide deﬁnitive stage conﬁrmation, since the region around the lung to be examined5 for lymph node involvement includes the superior mediastinum, the lower mediastinum, the aortopulmonary window, and para-aorta (Teran and Brock, 2014). Thus, a major rea- son for missingX in this example is physician’s judgment on whether an invasive test for a difﬁcult measurement ofX is worthwhile. Such judgment is usually made according to the objective measurements for patient, the observed variablesZ, and some subjective assess- ments, the unrecorded variable U in (1.5), which can be affected by patient’s prognostic factorsZ and illness related to the potential survival timeT , but is unlikely to be directly affected by censoring timeC. Thus, assumption (1.4) is more reasonable than assumption (1.3) for this real data example. SinceT may be censored, the missingness mechanism (1.4) is not missing at random. If (1.4) holds instead of (1.3), then estimators in Wang and Chen (2001), Qi et al. (2005) and Xu et al. (2009) will yield a biased result. Also, if (1.4) holds andT cannot be excluded from the missingness propensity, then the complete-case analysis is inconsistent. Although the survival function is identiﬁable under (1.4) and censoring assumption (1.2) (Rathouz, 2007), there is no proposed method for estimating in Rathouz (2007) and afterwards. The main challenge is that pr(R = 1jT;Z) cannot be directly estimated when T is censored. In this article, we construct two inverse probability weighting estimators of in the Cox proportional hazard model (1.1) under assumptions (1.2) and (1.4). The ﬁrst one is based on a weighted score function using only subjects with observed survival time and completely observed covariates. The ﬁrst estimator may not be efﬁcient since only data with (;R ) = (1; 1) are involved in the score function, but this estimator is used as an initial estimator in the construction of a more efﬁcient second estimator based on a weighted score function using all data withR = 1, censored or not. The major step in6 obtaining th