Theses and Dissertations


Sarah Harun

Issuing Body

Mississippi State University


Zhang, Song

Committee Member

Swan II, J. Edward

Committee Member

Archibald, Christopher

Committee Member

Medal, Hugh R.

Committee Member

Jankun-Kelly, T.J.

Other Advisors or Committee Members

Keith, Jason M.

Date of Degree


Original embargo terms


Document Type

Dissertation - Open Access


Computer Science

Degree Name

Doctor of Philosophy


James Worth Bagley College of Engineering


Department of Computer Science and Engineering


Botnets are networks formed with a number of machines infected by malware called bots. Detection of these malicious networks is a major concern as they pose a serious threat to network security. Most of the research on botnet detection is based on particular botnet characteristics which fail to detect other types of botnet. There exist several generic botnet detection methods that can detect varieties of botnets. But, these generic detection methods perform very poorly in real-life dataset as the methods are not developed based on a real-life botnet dataset. A crucial reason for those detection methods not being developed based on a real-life dataset is that there is a scarcity of large-scale real-life botnet dataset. Due to security and privacy concerns, organizations do not publish their real-life botnet dataset. Therefore, there is a dire need for a simulation methodology that generates a large-scale botnet dataset similar to the original real-life dataset while preserving the security and privacy of the network. In this dissertation, we develop a generic bot detection methodology that can detect a variety of bots and evaluate the methodology in a real-life, large, highly class-imbalanced dataset. Numerical results show that our methodology can detect bots more accurately than the existing methods. Realizing the need for real-life large-scale botnet dataset, we develop a simulation methodology to simulate a large-scale botnet dataset from a real-life botnet dataset. Our simulation methodology is based on Markov chain and role–mining process that can simulate the degree distributions along with triangles (community structures). To scale-up the original graph to large-scale graph, we also propose a scaling-up algorithm, Enterprise connection algorithm. We evaluate our simulated graph by comparing with the original graph as well as with the graph generated by Preferential attachment algorithm. Comparisons are done in the following three major categories: comparison of botnet subgraphs, comparison of overall graphs and comparison of scaled-up graphs. Result demonstrates that our methodology outperform Preferential attachment algorithm in simulating the triangle distributions and the botnet structure.