Theses and Dissertations

Issuing Body

Mississippi State University


Dampier, David

Committee Member

Butler, Cary

Committee Member

Vaughn, Rayford

Committee Member

Jankun-Kelly, T.J.

Date of Degree


Document Type

Dissertation - Open Access


Computer Science

Degree Name

Doctor of Philosophy


James Worth Bagley College of Engineering


Department of Computer Science and Engineering


Identification of source code authorship can be a useful tool in the areas of security and forensic investigation by helping to create corroborating evidence that may send a suspected cyber terrorist, hacker, or malicious code writer to jail. When applied to academia, it can also prove a useful tool for professors who suspect students of academic dishonesty, plagiarism, or modification of source code related to programming assignments. The purpose of this dissertation is to determine whether or not cross-entropy approaches to source code authorship analysis will succeed in predicting the correct author of a given piece of source code. If so, this work will try to identify factors that affect the accuracy of the algorithm, how programmer experience determines accuracy, and whether a cross-entropy approach performs better than some known source code authorship approaches. The approach taken in the research effort will manufacture a corpus of source code writings from various authors based on the same system descriptions and varying system descriptions, from which benchmarks of different approaches can be measured.