Theses and Dissertations

Issuing Body

Mississippi State University

Advisor

Perkins, Andy D.

Committee Member

Amirlatifi, Amin

Committee Member

Rahimi, Shahram

Committee Member

Mittal, Sudip

Committee Member

Chaudhary, Vini; Chen, Zhiqian

Date of Degree

8-7-2025

Original embargo terms

Visible MSU Only 1 year

Document Type

Graduate Thesis - Campus Access Only

Major

Computer Science

Degree Name

Master of Science (M.S.)

College

James Worth Bagley College of Engineering

Department

Department of Computer Science and Engineering

Abstract

Scientific abstracts are rich sources of knowledge, yet extracting meaningful topics remains challenging due to limitations in existing topic modeling techniques. Traditional methods often struggle with interpretability, scalability, and contextual understanding. To overcome these issues, we introduce TopNet R1, a multi-stage ensemble framework that integrates traditional topic models with contextual embeddings from Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer 4 (GPT-4) large language models (LLMs). Top- Net R1 operates in three phases: (1) topic generation using Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Latent Semantic Analysis (LSA), and Hierarchical Dirichlet Process (HDP); (2) LLM-assisted pattern recognition to refine and interpret topics; and (3) validation via mixed-effects regression. This design improves semantic coherence and diversity while maintaining efficiency. We evaluated TopNet R1 on PubMed and arXiv, where it outperformed traditional models in coherence (C_v, C_UMass) and diversity (C_TD), achieving C_v = 0.6189 and C_UMass = −2.94 on PubMed.

Share

COinS