Theses and Dissertations

Issuing Body

Mississippi State University

Advisor

Nanduri, Bindu

Committee Member

Perkins, Andy D.

Committee Member

Ramkumar, Mahalingam

Committee Member

Lawrence, Mark L.

Date of Degree

5-7-2016

Original embargo terms

MSU Only Indefinitely

Document Type

Dissertation - Campus Access Only

Major

Computational Biology

Degree Name

Doctor of Philosophy

College

College of Veterinary Medicine

Department

Veterinary Medical Science Program

Abstract

The scope and application of high throughput techniques has expanded from studying a single genome, transcriptome or proteome to understanding complex environments at a greater resolution with the help of novel computational frameworks. Comprehensive structural annotation i.e. description of all functional elements in the genome, is required for measuring genome response accurately, using high throughput methods. Annotation of genome sequences using high throughput data from RNA-seq and proteomics experiments complement computational methods for identifying functional elements and can help validate existing in silico annotation, correct annotation errors, and could potentially identify novel functional elements. Re-annotation studies in recent times have revealed shortcomings of automated methods and the necessity to validate existing annotations using experimental data. This dissertation elucidates re-annotation of Mannheimia haemolytica, Pasteurella multocida and Histophilus somni, bacterial pathogens associated with bovine respiratory disease in cattle. Experimental re-annotation of these bacterial genomes using RNA-seq and proteomics enabled the validation of existing annotation and discovery of novel functional elements that can be utilized in future functional genomics studies. We also addressed the need for developing an automated bioinformatics workflow that is broadly applicable for bacterial genome re-annotation, by developing open source Perl pipeline that can use RNA-seq and proteomics data as input. Simultaneous analysis of host and pathogen gene expression profiling using metatranscriptomics approaches is necessary to improve our understanding of infectious diseases. Traditional methods for analysis of RNA-seq data do not address the impact of cross-mapping of reads to multiple genomes for data originating from a metatranscriptomic study. Analysis of sequence conservation between species can help determine a metric for cross mapping to correct for signal vs. noise. We generated artificial RNA-seq data and evaluated the impact of read length and sequence conservation on cross-mapping. Comparative genomics was used to identify a core and pan-genome for quantifying gene expression. Our results show that cross mapping between genomes can directly be related to evolutionary distance between these genomes and that an increase in RNA-seq read length tends to negate cross mapping.

URI

https://hdl.handle.net/11668/17617

Share

COinS