Theses and Dissertations
ORCID
https://orcid.org/0000-0003-0025-5524
Issuing Body
Mississippi State University
Advisor
Bhowmik, Tanmay
Committee Member
Iannucci, Stefano
Committee Member
Chen, Zhiqian
Committee Member
Torri, Stephen
Date of Degree
5-12-2023
Document Type
Dissertation - Campus Access Only
Major
Computer Science
Degree Name
Doctor of Philosophy (Ph.D)
College
James Worth Bagley College of Engineering
Department
Department of Computer Science and Engineering
Abstract
With an increase in complexity of software, developers rely more on reuse and dependencies in their source code via code snippets. As a result, it is becoming harder to identify and mitigate vulnerabilities. Although traditional analysis tools are still utilized, machine learning models are being adopted to expand efforts and combat such threats. Given the possibilities towards usage of such models, research in this area has introduced various approaches which vary in usability and prediction. In generalizing models to a more natural language approach, researchers have opted to train models on source code to identify existing and potential vulnerabilities. Exploratory research has been performed by treating source code as plain text, creating “text-based” models. With a motivation to prevent vulnerable code snippets, we present a dissertation on the effectiveness of text-based machine learning models for vulnerability detection. We utilize datasets composed of open-source projects and vulnerability types to generate our own training and testing data via extracted function pairings. Using this data, we evaluate a series of text-based machine learning models, coupled with natural language processing (NLP) techniques and our own data processing methods. Through empirical research, we demonstrate the effectiveness of such models based on statistical evidence. From these results, we determine negative correlations and identify "cross-cutting" features. Finally, we present analysis of models with "cross-cutting" feature removal to improve performance while providing explainability towards model decisions.
Recommended Citation
Napier, Kollin Ryne, "An analysis of text-based machine learning models for vulnerability detection" (2023). Theses and Dissertations. 5849.
https://scholarsjunction.msstate.edu/td/5849