Theses and Dissertations
Issuing Body
Mississippi State University
Advisor
Peterson, G. Daniel
Committee Member
Bridges, Susan
Committee Member
Perkins, Andy
Committee Member
Hansen, Eric
Committee Member
Hodges, Julia
Other Advisors or Committee Members
Dandass, Yoginder
Date of Degree
8-8-2009
Document Type
Dissertation - Open Access
Major
Computer Science
Degree Name
Doctor of Philosophy (Ph.D)
College
James Worth Bagley College of Engineering
Department
Department of Computer Science and Engineering
Abstract
Our knowledge discovery algorithm employs a combination of association rule mining and graph mining to identify frequent spatial proximity relationships in genomic data where the data is viewed as a one-dimensional space. We apply mining techniques and metrics from association rule mining to identify frequently co-occurring features in genomes followed by graph mining to extract sets of co-occurring features. Using a case study of ab initio repeat finding, we have shown that our algorithm, ProxMiner, can be successfully applied to identify weakly conserved patterns among features in genomic data. The application of pairwise spatial relationships increases the sensitivity of our algorithm while the use of a confidence threshold based on false discovery rate reduces the noise in our results. Unlike available defragmentation algorithms, ProxMiner discovers associations among ab initio repeat families to identify larger more complete repeat families. ProxMiner will increase the effectiveness of repeat discovery techniques for newly sequenced genomes where ab initio repeat finders are only able to identify partial repeat families. In this dissertation, we provide two detailed examples of ProxMiner-discovered novel repeat families and one example of a known rice repeat family that has been extended by ProxMiner. These examples encompass some of the different types of repeat families that can be discovered by our algorithm. We have also discovered many other potentially interesting novel repeat families that can be further studied by biologists.
URI
https://hdl.handle.net/11668/15408
Recommended Citation
Saha, Surya, "Proximity based association rules for spatial data mining in genomes" (2009). Theses and Dissertations. 3676.
https://scholarsjunction.msstate.edu/td/3676
Comments
association rule mining||spatial rules||repeat||defragmentation||graph mining||association rule mining||spatial rules||repeat||defragmentation||graph mining||novel repeat regions||DNA