
Theses and Dissertations
Issuing Body
Mississippi State University
Advisor
Perkins, Andy D.
Committee Member
Amirlatifi, Amin
Committee Member
Rahimi, Shahram
Committee Member
Mittal, Sudip
Committee Member
Chaudhary, Vini; Chen, Zhiqian
Date of Degree
8-7-2025
Original embargo terms
Visible MSU Only 1 year
Document Type
Graduate Thesis - Campus Access Only
Major
Computer Science
Degree Name
Master of Science (M.S.)
College
James Worth Bagley College of Engineering
Department
Department of Computer Science and Engineering
Abstract
Scientific abstracts are rich sources of knowledge, yet extracting meaningful topics remains challenging due to limitations in existing topic modeling techniques. Traditional methods often struggle with interpretability, scalability, and contextual understanding. To overcome these issues, we introduce TopNet R1, a multi-stage ensemble framework that integrates traditional topic models with contextual embeddings from Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer 4 (GPT-4) large language models (LLMs). Top- Net R1 operates in three phases: (1) topic generation using Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Latent Semantic Analysis (LSA), and Hierarchical Dirichlet Process (HDP); (2) LLM-assisted pattern recognition to refine and interpret topics; and (3) validation via mixed-effects regression. This design improves semantic coherence and diversity while maintaining efficiency. We evaluated TopNet R1 on PubMed and arXiv, where it outperformed traditional models in coherence (C_v, C_UMass) and diversity (C_TD), achieving C_v = 0.6189 and C_UMass = −2.94 on PubMed.
Recommended Citation
Hossain, Md Elias, "TopNet R1: A multi-stage AI framework for topic discovery in scientific abstracts" (2025). Theses and Dissertations. 6651.
https://scholarsjunction.msstate.edu/td/6651