Advisor

Perkins, Andy D.

Committee Member

Hoffmann, Federico G.

Committee Member

DuBien, Janice

Committee Member

Wan, Xiu-Feng (Henry)

Date of Degree

1-1-2014

Document Type

Dissertation - Open Access

Major

Computational Engineering (Program)

Degree Name

Doctor of Philosophy

Abstract

In this study we build solutions to three common challenges in the fields of bioinformatics through utilizing statistical methods and developing computational approaches. First, we address a common problem in genome wide association studies, which is linking genotype features within organisms of the same species to their phenotype characteristics. We specifically studied FHA domain genes in Arabidopsis thaliana distributed within Eurasian regions by clustering those plants that share similar genotype characteristics and comparing that to the regions from which they were taken. Second, we also developed a tool for calculating transposable element density within different regions of a genome. The tool is built to utilize the information provided by other transposable element annotation tools and to provide the user with a number of options for calculating the density for various genomic elements such as genes, piRNA and miRNA or for the whole genome. It also provides a detailed calculation of densities for each family and subamily of the transposable elements. Finally, we address the problem of mapping multi reads in the genome and their effects on gene expression. To accomplish this, we implemented methods to determine the statistical significance of expression values within the genes utilizing both a unique and multi-read weighting scheme. We believe this approach provides a much more accurate measure of gene expression than existing methods such as discarding multi reads completely or assigning them randomly to a set of best assignments, while also providing a better estimation of the proper mapping locations of ambiguous reads. Overall, the solutions we built in these studies provide researchers with tools and approaches that aid in solving some of the common challenges that arise in the analysis of high throughput sequence data.

URI

https://hdl.handle.net/11668/19038

Share

COinS