
Theses and Dissertations
Issuing Body
Mississippi State University
Advisor
Zhou, Qian
Committee Member
Deng, Dewayne
Committee Member
Woody, Jon
Committee Member
Sepehrifar, Mohammad
Committee Member
Patil, Prakash; Wu, Tung-Lung
Date of Degree
8-7-2025
Original embargo terms
Visible MSU Only 2 Years
Document Type
Dissertation - Campus Access Only
Major
Mathematical sciences (Statistics)
Degree Name
Doctor of Philosophy (Ph.D.)
College
College of Arts and Sciences
Department
Department of Mathematics and Statistics
Abstract
This dissertation explored two missing data problems in statistical analysis arising from imbalance in agricultural research and censoring in survival studies. The first research was inspired by the U.S. National Cotton Variety Test trial, where the trial data are imbalanced because only a subset of varieties is selected for the following year. We simulated selection processes that differ from the existing literature and offer four main contributions. First, we adopted a joint modeling framework that utilizes a logistic regression to generate data that follow missing completely at random, missing at random, or missing not at random (MNAR). Second, our selection depends on multiple traits, whereas all existing studies used a single trait for selection. Third, besides variance components (VC), we studied 30-year long-term genetic and non-genetic trends. Last, we evaluated prediction accuracy for variety’s overall and location-specific performance. Results show that the VC and long-term trends estimations are the worst under MNAR using the single trait for selection. Compared to VC, the long-term trends estimation is more influenced by missing mechanism and rate. Prediction accuracy for variety’s performance is mainly driven by the missing rate and is less sensitive to the selection process. Ignoring the genetic and non-genetic long-term trends deteriorates both estimation and prediction, whereas adding more testing years improves them despite higher missingness. The second research addresses censored clustered survival data by developing two multiple imputation (MI) strategies within a copula framework. A marginal MI method imputes censored times based solely on covariates, whereas conditional MI approaches also utilize dependency between paired event times through a risk score framework. Both strategies apply Nearest Neighbor (NN) and Kernel Smoothing (KS) algorithms for imputing risk sets, and subsequent analysis performs Two-Stage Pseudo Maximum Likelihood Estimation (PMLE). Simulations across censoring levels, cluster sizes, and dependence strengths reveal that MI enhances the accuracy of marginal regression coefficient estimates but not copula parameter estimation. NN-based imputation outperforms KS, and frailty-adjusted MI with NN demonstrates robustness to copula-frailty misspecification. Marginal MI with NN is recommended for marginal targets; direct copula modeling via Two-Stage PMLE is preferred for dependence structures focused analyses.
Sponsorship (Optional)
NSF grant DMS-2210481
Recommended Citation
Fang, Zhou, "Missing data in agriculture and survival studies" (2025). Theses and Dissertations. 6640.
https://scholarsjunction.msstate.edu/td/6640