Theses and Dissertations
Issuing Body
Mississippi State University
Advisor
Bridges, Susan M.
Committee Member
Williams, W. Paul
Committee Member
McCarthy, Fiona M.
Committee Member
Allen, Edward B.
Date of Degree
12-11-2009
Document Type
Graduate Thesis - Open Access
Major
Computer Science
Degree Name
Master of Science
College
James Worth Bagley College of Engineering
Department
Department of Computer Science and Engineering
Abstract
This thesis presents a method for integrating heterogeneous gene/protein datasets at the functional level based on Gene Ontology term similarity. Often biologists want to integrate heterogeneous data sets obtain from different biological samples. A major challenge in this process is how to link the heterogeneous datasets. Currently, the most common approach is to link them through common reference database identifiers which tend to result in small number of matching identifiers. This is due to lack of standard accession schemes. Due to this problem, biologists may not recognize the underlying biological phenomena revealed by a combination of the data but by each data set individually. We discuss an approach for integrating heterogeneous datasets by computing the similarity among them based on the similarity of their GO annotations. Then we group the genes and/or proteins with similar annotations by applying a hierarchical clustering algorithm. The results demonstrate a more comprehensive understanding of the biological processes involved.
URI
https://hdl.handle.net/11668/20406
Recommended Citation
Thanthiriwatte, Chamali Lankara, "A Method for Integrating Heterogeneous Datasets based on GO Term Similarity" (2009). Theses and Dissertations. 176.
https://scholarsjunction.msstate.edu/td/176