Theses and Dissertations
Issuing Body
Mississippi State University
Advisor
McCarthy, Fiona M.
Committee Member
Nanduri, Bindu
Committee Member
Peterson, Daniel G.
Committee Member
Bridges, Susan M.
Committee Member
Burgess, Shane C.
Date of Degree
12-15-2012
Document Type
Dissertation - Open Access
Major
Veterinary Medical Science
Degree Name
Doctor of Philosophy (Ph.D)
College
College of Veterinary Medicine
Department
Veterinary Medical Science Program
Abstract
Advances in next-generation sequencing (NGS) technologies have resulted in significant reduction of cost per sequenced base pair and increase in sequence data volume. On the other hand, most currently used NGS technologies produce relatively short sequence reads (50 - 150 bp) compared to Sanger sequencing (~700 bp). This represents an additional challenge in data analysis, because shorter reads are more difficult to assemble. At this point, production of sequencing data outpaces our capacity to analyze them. Newer NGS technologies capable of producing longer reads are emerging, which should simplify and speed up genome assembly. However, this will only increase the number of sequenced genomes without structural and functional annotation. In addition to multiple scientific initiatives to sequence thousands of genomes, personalized medicine centered on sequencing and analysis of individual human genomes will become more available. This poses a challenge for computer science and emphasizes the importance of developing new computational algorithms, methodology, tools, and pipelines. This dissertation focuses on development of these software tools, methodologies, and resources to help address the need for processing of volumes of data generated by new sequencing technologies. The research concentrated on genome structure analysis, individual variation, and comparative biology. This dissertation presents: (1) the Short Read Classification Pipeline (SRCP) for preliminary genome characterization of unsequenced genomes; (2) a novel methodology for phylogenetic analysis of closely related organisms or strains of the same organism without a sequenced genome; (3) a centralized online resource for standardized gene nomenclature. Utilizing the SRCP and the methodology for initial phylogenetic analysis developed in this dissertation enables positioning the organism in the evolutionary context. This should facilitate identification of orthologs between the species and paralogs within the species even in the initial stage of the analysis when only exome is sequenced and, thus, enable functional annotation by transferring gene nomenclature from well-annotated 1:1 orthologs, as required by the online standardized gene nomenclature resource developed in this dissertation. Thus, the tools, methodology, and resources presented here are tied together in following the initial analysis workflow for structural and functional annotation.
URI
https://hdl.handle.net/11668/20369
Recommended Citation
Chouvarine, Philippe, "Genomic and Functional Analysis of Next-Generation Sequencing Data" (2012). Theses and Dissertations. 2407.
https://scholarsjunction.msstate.edu/td/2407