欢迎来到吾爱文库! | 帮助中心 分享价值,成长自我!
吾爱文库
换一换
首页 吾爱文库 > 资源分类 > PDF文档下载
 

DTPQDT02019019642.pdf

  • 资源ID:3481       资源大小:658.71KB        全文页数:71页
  • 资源格式: PDF        下载权限:游客/注册会员/VIP会员    下载费用:1金币 【人民币1元】
快捷注册下载 游客一键下载
会员登录下载
三方登录下载: QQ登录   微博登录  
下载资源需要1金币 【人民币1元】
邮箱/手机:
温馨提示:
支付成功后,系统会自动生成账号(用户名和密码都是您填写的邮箱或者手机号),方便下次登录下载和查询订单;
支付方式: 微信支付    支付宝   
验证码:   换一换

加入VIP,下载共享资源
 
友情提示
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,既可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰   

DTPQDT02019019642.pdf

CoxRegressionwithSurvival-Time-DependentMissingCovariateValues by Yanyao Yi A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Statistics at the UNIVERSITY OF WISCONSIN–MADISON 2019 Date of final oral examination 04/24/2019 The dissertation is approved by the following members of the Final Oral Committee Richard Chappell, Professor, Department of Statistics Zhengjun Zhang, Professor, Department of Statistics Lu Mao, Assistant Professor, Department of Biostatistics and Medical Informatics Menggang Yu, Professor, Department of Biostatistics and Medical Informatics Jun Shao, Professor, Department of Statistics ProQuest Number All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages,these will be noted. Also, if material had to be removed, a note will indicate the deletion. ProQuest Published by ProQuest LLC . Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code Microform Edition ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 22583529 22583529 2019c Copyright by Yanyao Yi 2019 All Rights Reservedi Abstract Analysis with time-to-event data in clinical and epidemiological studies often encounters missing covariate values and the missing at random assumption is commonly adopted e.g., Qi et al., 2005, which assumes that missingness depends on the observed data, including the observed outcome which is the minimum of survival and censoring time. However, it is conceivable that in certain settings, missingness of covariate values is related to the survival time but not to the censoring time Rathouz, 2007. This is especially so when covariate missingness is related with an unmeasured variable that is affected by survival time but does not causally affect survival. If this is the case, then the covariate missingness is not at random as the survival time is censored, and it creates a challenge in data analysis. In this article, we propose an approach to deal with such survival-time-dependent covariate missingness based on the well known Cox proportional hazard model. Our method is based on inverse propensity weighting with the propensity estimated by nonparametric kernel regression. Our estimators are consistent and asymptotically normal, and their finite- sample performance is examined through simulation. An application to a real-data example is included for illustration.ii Contents Abstract i 1 Introduction 1 2 Method 7 2.1 Doubly Weighted Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Compositely Weighted Estimator . . . . . . . . . . . . . . . . . . . . . . . 13 3 Simulation 17 3.1 Simulation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Example 25 4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Missingness Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5 Proofs 31iii 5.1 Proof of the fact that 1.5 implies 1.4 . . . . . . . . . . . . . . . . . . . 31 5.2 Proof of the fact that 1.5 and 1.2 imply 1.6 . . . . . . . . . . . . . . . 32 5.3 Proof of the fact that 1.5 implies 1.7 . . . . . . . . . . . . . . . . . . . 32 5.4 Proof of Theorems 2.1.1 and 2.2.1 . . . . . . . . . . . . . . . . . . . . . . 33 5.4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.4.2 Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.4.3 Proof of Theorem 2.1.1 . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4.4 Proof of Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . 45 6 Discussion 50 6.1 Discussion on Weights Estimation . . . . . . . . . . . . . . . . . . . . . . 50 6.1.1 Missingness Propensity for Simulation Setting 2 . . . . . . . . . . 51 6.1.2 Censoring Weight for Simulation Setting 2 . . . . . . . . . . . . . 53 6.2 Discussion on the effect of Kernel Estimatorb 1 . . . . . . . . . . . . . . . 54 6.3 Discussion on Item Missingness of MultivariateX . . . . . . . . . . . . . 56 6.4 Discussion on Time-varying covariates . . . . . . . . . . . . . . . . . . . . 56 References 59iv ListofTables 3.1 Bias, SD, SE, and CP based on 2000 simulation runs for estimation of under setting 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Bias, SD, SE, and CP based on 2000 simulation runs for estimation of under setting 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Bias, SD, SE, and CP based on 2000 simulation runs for estimation of under setting 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Relative Efficiency of DWE with respect to CWE variance ration . . . . . . . . 23 4.1 Estimates of coefficients of covariates in Cox regression with stage III NSCLC data 29 4.2 Mutual comparison between CC, SWE and CWE for Stage III NSCLC data . . . . 30v ListofFigures 1.1 DAG constructed under assumptions 1.5 and 1.2 . . . . . . . . . . . . . 4 4.1 Decision tree for assumption 1.3 . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Decision tree for assumption 1.4 . . . . . . . . . . . . . . . . . . . . . . 27 6.1 Comparison between estimatedb 1 T;X and true 1 T;X overT . . . . . 52 6.2 Comparison between estimated b T;X and true T;X overT . . . . . 54vi Acknowledgments There is no doubt that the first person I would like to express my deepest gratitude is my advisor Prof. Jun Shao, for the continuous support of my study and related research, for his patience, motivation, and immense knowledge. It is no exaggeration to say Prof. Shao changed my life and will continuously influence my future life, in attitudes of both working and living. I could not have imagined having a better advisor. My sincere thanks also goes to Guang, Prof. Shao’s wife, who has always treated my wife and me as family and made us feel Madison as our home. I also would like to thank my co-advisor Prof. Menggang Yu, for his support of research assistantship, which provided me not only financial support but also important research experience. Besides my advisors, I would like to thank the rest of my thesis committee Prof. Richard Chappell, Prof. Lu Mao and Prof. Zhengjun Zhang, for their insightful comments and encouragement, but also for the hard question which incented me to widen my research from various perspectives. In addition, I owe my thanks to Prof. Zhengjun Zhang and Chunming Zhang, for teaching me the first statistics courses STAT 609, 610 and 849, which introduced me into the spectacular statistical world and established a solid foundation for my Ph.D study. I am also indebted to Prof. Qiongshi Lv for leading me into genetic statistics at my last year of Ph.D, bringing me to explore this exciting field andvii writing papers with me. With a special mention to my best friends met in Madison, Muxuan Liang and Xiuyu Ma, it was fantastic to have the opportunity to spend majority of my PhD life with you. Last but not the least, I would like to thank my parents for their unconditional love and supports and especially my wife Dr. Ting Ye, for her everlasting love, understanding, support and making my life wonderful.1 Chapter1 Introduction Cox regression is one of the most popular methods dealing with censored failure time in survival analysis. For a continuous failure time T and covariate vectorV measured at baseline, in this paper we consider the following Cox proportional hazard model, tjV 0 t exp V ; 1.1 where tjV is the hazard at time t given V , 0 t is an unspecified baseline hazard function common for all subjects, is a vector of unknown parameters, and is its transpose. In many survival studies, there exits censoring. In this paper, we focus on right censoring, i.e., there is a continuous censoring timeC and what we observe areTC minT;C and I fT Cg , the indicator of event T C. A common assumption on censoring is TCjV; 1.22 i.e., T and C are conditionally independent givenV . Based on a random sample from the distribution of TC;; V , can be estimated by maximizing the partial likelihood derived in Cox 1975 under 1.1 and 1.2, and the asymptotic properties of this estimator can be found in Andersen and Gill 1982. In clinical and epidemiological studies some components of the covariate vectorV may have missing values and the partial likelihood cannot be directly applied. LetV X;Z withX being the sub-vector that may have missing values andZ being the sub-vector that is always observed, and letR be the indicator equaling 1 ifX is completely observed and 0 if at least one component ofX is missing. As pointed out in Paik and Tsai 1997, Lipsitz and Ibrahim 1998, and Rathouz 2007, the complete-case analysis with the partial likelihood based only on subjects with R 1 is valid if R T;C;V , i.e., missing completely at random Rubin, 1976, orR T;CjV , i.e., missingness depends only onV , not on outcome T;C, although missingness may be not at random and methods more efficient than complete-case analysis can be derived, e.g., Lin and Ying 1993 and Cook et al. 2011. However, missingness of covariate values is often believed to be outcome related, either directly or indirectly. In survival studies, some researchers assume that missingness isTC related, i.e., R X;T;Cj TC;Z; 1.3 e.g., Lipsitz and Ibrahim, 1998; Chen and Little, 1999; Herring and Ibrahim, 2001; Chen, 2002, which is a type of missing at random assumption Rubin, 1976 becauseTC,Z and are all observed. Even ifR is related with T;C, however, it is hard to imagine why R is related toTC, a very special function of the outcome T;C. We speculate that 1.3 is assumed for an easy analysis, since one can simply use methods valid under missing at3 random, e.g., inverse propensity weighting with estimated prR 1j TC;Z; based on always observed TC;Z; Wang and Chen, 2001; Qi et al., 2005; Xu et al., 2009. Rathouz 2007 argues that the following survival-time-dependent missingness mecha- nism is more reasonable than assumption 1.3 in many biomedical studies, R X;Cj T;Z 1.4 One scenario in which 1.4 holds is when there is an unmeasured variableU satisfying R T;C;Xj U;Z and U C;Xj T;Z 1.5 The first condition in 1.5 means that missingness of X is driven by U together with observedZ, whereas the second condition in 1.5 says thatU is not dependent with C;X when T;Z is given. It is shown in Section 5.1 and 5.2 respectively that 1.5 implies 1.4, and together with 1.2 implies that fR;U;T;C;V fRjU;ZfUjT;ZfTjV fC;V ; 1.6 wherefj orf is used as a generic notation for a conditional or unconditional density. Result 1.6 indicates thatU does not causally affectT , i.e., it does not enter infTjV , but U is affected by T through fU j T;Z. It is also shown in Section 5.3 that 1.5 implies prR 1jT;Z EfprR 1jU;ZjT;Zg; 1.7 which means T is a surrogate for the unmeasured U in the missingness propensity. The above Directed Acyclic Graph DAG constructed under assumptions 1.5 and 1.2 pro-4 Figure 1.1 DAG constructed under assumptions 1.5 and 1.2 vides an illustration of result 1.6. More details about DAG and graphic models can be found in Pearl et al. 2016. An example ofU could be some phenotype of potential sur- vival or a subjective assessment, as the following example indicates. Example 1. In our real data example analyzed in Chapter 4 based on the national cancer database NCDB https//www.facs.org/quality-programs/cancer/ncdb, all patients were diagnosed with stage III Non-Small Cell Lung Cancer NSCLC at baseline, but only about 40 patients had more accurate tumor stageX recorded as either stage IIIA or stage IIIB. From the lung cancer staging system provided by the American Joint Committee on Cancer https//cancerstaging.org/references-tools/quickreferences/Documents/LungMedium.pdf, the main difference between stage IIIA and stage IIIB is the nodal involvement, i.e., stage IIIB has much more extensive metastasis in regional lymph nodes regardless of tumor size. Measuring X is difficult and may require invasive techniques. Recent advanced non-invasive methods such as positron emission tomography or compute tomography do not provide definitive stage confirmation, since the region around the lung to be examined5 for lymph node involvement includes the superior mediastinum, the lower mediastinum, the aortopulmonary window, and para-aorta Teran and Brock, 2014. Thus, a major rea- son for missingX in this example is physician’s judgment on whether an invasive test for a difficult measurement ofX is worthwhile. Such judgment is usually made according to the objective measurements for patient, the observed variablesZ, and some subjective assess- ments, the unrecorded variable U in 1.5, which can be affected by patient’s prognostic factorsZ and illness related to the potential survival timeT , but is unlikely to be directly affected by censoring timeC. Thus, assumption 1.4 is more reasonable than assumption 1.3 for this real data example. SinceT may be censored, the missingness mechanism 1.4 is not missing at random. If 1.4 holds instead of 1.3, then estimators in Wang and Chen 2001, Qi et al. 2005 and Xu et al. 2009 will yield a biased result. Also, if 1.4 holds andT cannot be excluded from the missingness propensity, then the complete-case analysis is inconsistent. Although the survival function is identifiable under 1.4 and censoring assumption 1.2 Rathouz, 2007, there is no proposed method for estimating in Rathouz 2007 and afterwards. The main challenge is that prR 1jT;Z cannot be directly estimated when T is censored. In this article, we construct two inverse probability weighting estimators of in the Cox proportional hazard model 1.1 under assumptions 1.2 and 1.4. The first one is based on a weighted score function using only subjects with observed survival time and completely observed covariates. The first estimator may not be efficient since only data with ;R 1; 1 are involved in the score function, but this estimator is used as an initial estimator in the construction of a more efficient second estimator based on a weighted score function using all data withR 1, censored or not. The major step in6 obtaining th

注意事项

本文(DTPQDT02019019642.pdf)为本站会员($向钱看齐$)主动上传,吾爱文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知吾爱文库(发送邮件至123456@qq.com或直接QQ联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




copyright@ 2008-2013 吾爱文库网站版权所有
经营许可证编号:京ICP备12026657号-3

收起
展开