Theses and Dissertations

Issuing Body

Mississippi State University

Advisor

Zhou, Qian

Committee Member

Deng, Dewayne

Committee Member

Woody, Jon

Committee Member

Sepehrifar, Mohammad

Committee Member

Patil, Prakash; Wu, Tung-Lung

Date of Degree

8-7-2025

Original embargo terms

Visible MSU Only 2 Years

Document Type

Dissertation - Campus Access Only

Major

Mathematical sciences (Statistics)

Degree Name

Doctor of Philosophy (Ph.D.)

College

College of Arts and Sciences

Department

Department of Mathematics and Statistics

Abstract

This dissertation explored two missing data problems in statistical analysis arising from imbalance in agricultural research and censoring in survival studies. The first research was inspired by the U.S. National Cotton Variety Test trial, where the trial data are imbalanced because only a subset of varieties is selected for the following year. We simulated selection processes that differ from the existing literature and offer four main contributions. First, we adopted a joint modeling framework that utilizes a logistic regression to generate data that follow missing completely at random, missing at random, or missing not at random (MNAR). Second, our selection depends on multiple traits, whereas all existing studies used a single trait for selection. Third, besides variance components (VC), we studied 30-year long-term genetic and non-genetic trends. Last, we evaluated prediction accuracy for variety’s overall and location-specific performance. Results show that the VC and long-term trends estimations are the worst under MNAR using the single trait for selection. Compared to VC, the long-term trends estimation is more influenced by missing mechanism and rate. Prediction accuracy for variety’s performance is mainly driven by the missing rate and is less sensitive to the selection process. Ignoring the genetic and non-genetic long-term trends deteriorates both estimation and prediction, whereas adding more testing years improves them despite higher missingness. The second research addresses censored clustered survival data by developing two multiple imputation (MI) strategies within a copula framework. A marginal MI method imputes censored times based solely on covariates, whereas conditional MI approaches also utilize dependency between paired event times through a risk score framework. Both strategies apply Nearest Neighbor (NN) and Kernel Smoothing (KS) algorithms for imputing risk sets, and subsequent analysis performs Two-Stage Pseudo Maximum Likelihood Estimation (PMLE). Simulations across censoring levels, cluster sizes, and dependence strengths reveal that MI enhances the accuracy of marginal regression coefficient estimates but not copula parameter estimation. NN-based imputation outperforms KS, and frailty-adjusted MI with NN demonstrates robustness to copula-frailty misspecification. Marginal MI with NN is recommended for marginal targets; direct copula modeling via Two-Stage PMLE is preferred for dependence structures focused analyses.

Sponsorship (Optional)

NSF grant DMS-2210481

Share

COinS