Theses and Dissertations


Surya Saha

Issuing Body

Mississippi State University


Peterson, G. Daniel

Committee Member

Bridges, Susan

Committee Member

Perkins, Andy

Committee Member

Hansen, Eric

Committee Member

Hodges, Julia

Other Advisors or Committee Members

Dandass, Yoginder

Date of Degree


Document Type

Dissertation - Open Access


Computer Science

Degree Name

Doctor of Philosophy (Ph.D)


James Worth Bagley College of Engineering


Department of Computer Science and Engineering


Our knowledge discovery algorithm employs a combination of association rule mining and graph mining to identify frequent spatial proximity relationships in genomic data where the data is viewed as a one-dimensional space. We apply mining techniques and metrics from association rule mining to identify frequently co-occurring features in genomes followed by graph mining to extract sets of co-occurring features. Using a case study of ab initio repeat finding, we have shown that our algorithm, ProxMiner, can be successfully applied to identify weakly conserved patterns among features in genomic data. The application of pairwise spatial relationships increases the sensitivity of our algorithm while the use of a confidence threshold based on false discovery rate reduces the noise in our results. Unlike available defragmentation algorithms, ProxMiner discovers associations among ab initio repeat families to identify larger more complete repeat families. ProxMiner will increase the effectiveness of repeat discovery techniques for newly sequenced genomes where ab initio repeat finders are only able to identify partial repeat families. In this dissertation, we provide two detailed examples of ProxMiner-discovered novel repeat families and one example of a known rice repeat family that has been extended by ProxMiner. These examples encompass some of the different types of repeat families that can be discovered by our algorithm. We have also discovered many other potentially interesting novel repeat families that can be further studied by biologists.



association rule mining||spatial rules||repeat||defragmentation||graph mining||association rule mining||spatial rules||repeat||defragmentation||graph mining||novel repeat regions||DNA