Mississippi State University
Date of Degree
Dissertation - Open Access
Electrical and Computer Engineering
Doctor of Philosophy (Ph.D)
James Worth Bagley College of Engineering
Department of Electrical and Computer Engineering
RNA (Ribonuceic Acid) sequencing technology is a powerful technology used to give re- searchers essential information about the functionality of genes. The transcriptomic study and downstream analysis highlight the functioning of the genes associated with a specific biological process/treatment. In practice, differentially expressed genes associated with a particular treatment or genotype are subjected to downstream analysis to find some critical set of genes. This critical set of genes/ genes pathways infers the effect of the treatment in a cell or tissue. This disserta- tion describes the multiple stages framework of finding these critical sets of genes using different analysis methodologies and inference algorithms.
RNA sequencing technology helps to find the differentially expressed genes associated with the treatments and genotypes. The preliminary step of RNA-seq analysis consists of extracting the mRNA(messenger RNA) followed by mRNA libraries’ preparation and sequencing using the Illumina HiSeq 2000 platform. The later stage analysis starts with mapping the RNA sequencing data (obtained from the previous step) to the genome annotations and counting each annotated
gene’s reads to produce the gene expression data. The second step involves using the statistical method such as linear model fit, clustering, and probabilistic graphical modeling to analyze genes and gene networks’ role in treatment responses.
In this dissertation, an R software package is developed that compiles all the RNA sequencing steps and the downstream analysis using the R software and Linux environment.
Inference methodology based on loopy belief propagation is conducted on the gene networks to infer the differential expression of the gene in the further step. The loopy belief propagation algorithm uses a computational modeling framework that takes the gene expression data and the transcriptional Factor interacting with the genes. The inference method starts with constructing a gene-Transcriptional Factor network. The construction of the network uses an undirected proba- bilistic graphical modeling approach. Later the belief message is propagated across all the nodes of the graphs.
The analysis and inference methods explained in the dissertation were applied to the Arabidopsis plant with two different genotypes subjected to two different stress treatments. The results for the analysis and inference methods are reported in the dissertation.
Srivastava, Himangi, "Methods for inference and analysis of gene networks from RNA sequencing data" (2021). Theses and Dissertations. 5352.