Mississippi State University
Perkins, Andy D.
Hoffmann, Federico G.
Wan, Xiu-Feng (Henry)
Date of Degree
Dissertation - Open Access
Computational Engineering (Program)
Doctor of Philosophy
James Worth Bagley College of Engineering
Computational Engineering Program
In this study we build solutions to three common challenges in the fields of bioinformatics through utilizing statistical methods and developing computational approaches. First, we address a common problem in genome wide association studies, which is linking genotype features within organisms of the same species to their phenotype characteristics. We specifically studied FHA domain genes in Arabidopsis thaliana distributed within Eurasian regions by clustering those plants that share similar genotype characteristics and comparing that to the regions from which they were taken. Second, we also developed a tool for calculating transposable element density within different regions of a genome. The tool is built to utilize the information provided by other transposable element annotation tools and to provide the user with a number of options for calculating the density for various genomic elements such as genes, piRNA and miRNA or for the whole genome. It also provides a detailed calculation of densities for each family and subamily of the transposable elements. Finally, we address the problem of mapping multi reads in the genome and their effects on gene expression. To accomplish this, we implemented methods to determine the statistical significance of expression values within the genes utilizing both a unique and multi-read weighting scheme. We believe this approach provides a much more accurate measure of gene expression than existing methods such as discarding multi reads completely or assigning them randomly to a set of best assignments, while also providing a better estimation of the proper mapping locations of ambiguous reads. Overall, the solutions we built in these studies provide researchers with tools and approaches that aid in solving some of the common challenges that arise in the analysis of high throughput sequence data.
Aldwairi, Tamer Ali, "Computational Methods for Solving Next Generation Sequencing Challenges" (2014). Theses and Dissertations. 1140.