Theses and Dissertations
Issuing Body
Mississippi State University
Advisor
Perkins, Andy D.
Committee Member
Hoffmann, Federico G.
Committee Member
DuBien, Janice
Committee Member
Wan, Xiu-Feng (Henry)
Date of Degree
12-13-2014
Document Type
Dissertation - Open Access
Major
Computational Engineering (Program)
Degree Name
Doctor of Philosophy
College
James Worth Bagley College of Engineering
Department
Computational Engineering Program
Abstract
In this study we build solutions to three common challenges in the fields of bioinformatics through utilizing statistical methods and developing computational approaches. First, we address a common problem in genome wide association studies, which is linking genotype features within organisms of the same species to their phenotype characteristics. We specifically studied FHA domain genes in Arabidopsis thaliana distributed within Eurasian regions by clustering those plants that share similar genotype characteristics and comparing that to the regions from which they were taken. Second, we also developed a tool for calculating transposable element density within different regions of a genome. The tool is built to utilize the information provided by other transposable element annotation tools and to provide the user with a number of options for calculating the density for various genomic elements such as genes, piRNA and miRNA or for the whole genome. It also provides a detailed calculation of densities for each family and subamily of the transposable elements. Finally, we address the problem of mapping multi reads in the genome and their effects on gene expression. To accomplish this, we implemented methods to determine the statistical significance of expression values within the genes utilizing both a unique and multi-read weighting scheme. We believe this approach provides a much more accurate measure of gene expression than existing methods such as discarding multi reads completely or assigning them randomly to a set of best assignments, while also providing a better estimation of the proper mapping locations of ambiguous reads. Overall, the solutions we built in these studies provide researchers with tools and approaches that aid in solving some of the common challenges that arise in the analysis of high throughput sequence data.
URI
https://hdl.handle.net/11668/19038
Recommended Citation
Aldwairi, Tamer Ali, "Computational Methods for Solving Next Generation Sequencing Challenges" (2014). Theses and Dissertations. 1140.
https://scholarsjunction.msstate.edu/td/1140