Author

Harun Pirim

Advisor

Eksioglu,Burak

Committee Member

Greenwood, Allen

Committee Member

Perkins, Andy D.

Committee Member

Yuceer, Cetin

Committee Member

Jin, Mingzhou

Date of Degree

1-1-2011

Document Type

Dissertation - Open Access

Major

Industrial Engineering

Degree Name

Doctor of Philosophy

College

James Worth Bagley College of Engineering

Department

Department of Industrial and Systems Engineering

Abstract

A new minimum spanning tree (MST) based heuristic for clustering biological data is proposed. The heuristic uses MSTs to generate initial solutions and applies a local search to improve the solutions. Local search transfers the nodes to the clusters with which they have the most connections, if this transfer improves the objective function value. A new objective function is defined and used in the heuristic. The objective function considers both tightness and separation of the clusters. Tightness is obtained by minimizing the maximum diameter among all clusters. Separation is obtained by minimizing the maximum number of connections of a gene with other clusters. The objective function value calculation is realized on a binary graph generated using the threshold value and keeping the minimumpercentage of edges while the binary graph is connected. Shortest paths between nodes are used as distance values between gene pairs. The efficiency and the effectiveness of the proposed method are tested using fourteen different data sets externally and biologically. The method finds clusters which are similar to actual ones using 12 data sets for which actual clusters are known. The method also finds biologically meaningful clusters using 2 data sets for which real clusters are not known. A mixed integer programming model for clustering biological data is also proposed for future studies.

URI

https://hdl.handle.net/11668/14920

Share

COinS