Theses and Dissertations

Issuing Body

Mississippi State University


Swan II, J. Edward

Committee Member

Bridges, Susan M.

Committee Member

Hodges, Julia E.

Date of Degree


Document Type

Graduate Thesis - Open Access


Computer Science

Degree Name

Master of Science


James Worth Bagley College of Engineering


Department of Computer Science and Engineering


Information retrieval is the process of fulfilling a user?s need for information by locating items in a data collection that are similar to a complex query that is often posed in natural language. Latent Semantic Indexing (LSI) was the predominant technique employed at the National Institute of Standards and Technology?s Text Retrieval Conference for many years until limitations of its scalability to large data sets were discovered. This thesis describes SCRIBE, a modification of LSI with improved scalability. SCRIBE clusters its semantic index into discrete volumes described by high-dimensional extensions to computer graphics data structures. SCRIBE?s clustering strategy limits the number of items that must be searched and provides for sub-linear time complexity in the number of documents. Experimental results with a large, natural language document collection demonstrate that SCRIBE achieves retrieval accuracy similar to LSI but requires 1/10 the time.



ingular value decomposition||knowledge discovery in data