Theses and Dissertations

ORCID

https://orcid.org/0009-0003-3600-1531

Issuing Body

Mississippi State University

Advisor

Rahimi, Shahram

Committee Member

Gudla, Charan

Committee Member

Mittal, Sudip

Date of Degree

12-13-2024

Original embargo terms

Visible MSU only 6 months

Document Type

Graduate Thesis - Campus Access Only

Major

Computer Science(Research)

Degree Name

Master of Science (M.S.)

College

James Worth Bagley College of Engineering

Department

Department of Computer Science and Engineering

Abstract

As the field of Natural Language Processing (NLP) continues to evolve, evaluating the performance of both proprietary and open-source language models has become increasingly critical. This research provides a comprehensive analysis of proprietary models like GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo, alongside open-source models such as FLAN-T5, GPT-Neo, and GPT-2. By leveraging traditional metrics like ROUGE and BLEU, as well as custom metrics including ReGrAde, Contextual Precision, and Faithfulness, the study evaluates these models across closed-domain tasks (e.g., factual question-answering) and open-domain tasks (e.g., creative writing and brainstorming). The proprietary models excelled in structured, fact-based tasks, while the open-source models exhibited strengths in creative and open-ended tasks. The study highlights the limitations of both model types, particularly in summarization and maintaining faithfulness to input data. The findings offer key insights into the future development of language models and their application, pushing the boundaries of NLP capabilities for both research and practical implementations.

Share

COinS