Issuing Body

Mississippi State University

Advisor

McCarthy, Fiona M.

Committee Member

Nanduri, Bindu

Committee Member

Peterson, Daniel G.

Committee Member

Bridges, Susan M.

Committee Member

Burgess, Shane C.

Date of Degree

12-15-2012

Document Type

Dissertation - Open Access

Major

Veterinary Medical Science

Degree Name

Doctor of Philosophy (Ph.D)

College

College of Veterinary Medicine

Department

Veterinary Medical Science Program

Abstract

Advances in next-generation sequencing (NGS) technologies have resulted in significant reduction of cost per sequenced base pair and increase in sequence data volume. On the other hand, most currently used NGS technologies produce relatively short sequence reads (50 - 150 bp) compared to Sanger sequencing (~700 bp). This represents an additional challenge in data analysis, because shorter reads are more difficult to assemble. At this point, production of sequencing data outpaces our capacity to analyze them. Newer NGS technologies capable of producing longer reads are emerging, which should simplify and speed up genome assembly. However, this will only increase the number of sequenced genomes without structural and functional annotation. In addition to multiple scientific initiatives to sequence thousands of genomes, personalized medicine centered on sequencing and analysis of individual human genomes will become more available. This poses a challenge for computer science and emphasizes the importance of developing new computational algorithms, methodology, tools, and pipelines. This dissertation focuses on development of these software tools, methodologies, and resources to help address the need for processing of volumes of data generated by new sequencing technologies. The research concentrated on genome structure analysis, individual variation, and comparative biology. This dissertation presents: (1) the Short Read Classification Pipeline (SRCP) for preliminary genome characterization of unsequenced genomes; (2) a novel methodology for phylogenetic analysis of closely related organisms or strains of the same organism without a sequenced genome; (3) a centralized online resource for standardized gene nomenclature. Utilizing the SRCP and the methodology for initial phylogenetic analysis developed in this dissertation enables positioning the organism in the evolutionary context. This should facilitate identification of orthologs between the species and paralogs within the species even in the initial stage of the analysis when only exome is sequenced and, thus, enable functional annotation by transferring gene nomenclature from well-annotated 1:1 orthologs, as required by the online standardized gene nomenclature resource developed in this dissertation. Thus, the tools, methodology, and resources presented here are tied together in following the initial analysis workflow for structural and functional annotation.

URI

https://hdl.handle.net/11668/20369

Recommended Citation

Chouvarine, Philippe, "Genomic and Functional Analysis of Next-Generation Sequencing Data" (2012). Theses and Dissertations. 2407.
https://scholarsjunction.msstate.edu/td/2407

Download

COinS

Theses and Dissertations

Genomic and Functional Analysis of Next-Generation Sequencing Data

Issuing Body

Advisor

Committee Member

Committee Member

Committee Member

Committee Member

Date of Degree

Document Type

Major

Degree Name

College

Department

Abstract

URI

Recommended Citation

Browse

Search

Author Corner

Links

Links

Brought to you by MSU Libraries

Theses and Dissertations

Genomic and Functional Analysis of Next-Generation Sequencing Data

Author

Issuing Body

Advisor

Committee Member

Committee Member

Committee Member

Committee Member

Date of Degree

Document Type

Major

Degree Name

College

Department

Abstract

URI

Recommended Citation

Share

Browse

Search

Author Corner

Links

Links

Brought to you by MSU Libraries