Theses and Dissertations

Issuing Body

Mississippi State University

Advisor

Swan II, J. Edward

Committee Member

Bridges, Susan M.

Committee Member

Hodges, Julia E.

Date of Degree

8-5-2006

Document Type

Graduate Thesis - Open Access

Major

Computer Science

Degree Name

Master of Science

College

James Worth Bagley College of Engineering

Department

Department of Computer Science and Engineering

Abstract

Information retrieval is the process of fulfilling a user?s need for information by locating items in a data collection that are similar to a complex query that is often posed in natural language. Latent Semantic Indexing (LSI) was the predominant technique employed at the National Institute of Standards and Technology?s Text Retrieval Conference for many years until limitations of its scalability to large data sets were discovered. This thesis describes SCRIBE, a modification of LSI with improved scalability. SCRIBE clusters its semantic index into discrete volumes described by high-dimensional extensions to computer graphics data structures. SCRIBE?s clustering strategy limits the number of items that must be searched and provides for sub-linear time complexity in the number of documents. Experimental results with a large, natural language document collection demonstrate that SCRIBE achieves retrieval accuracy similar to LSI but requires 1/10 the time.

URI

https://hdl.handle.net/11668/17324

Comments

ingular value decomposition||knowledge discovery in data

Share

COinS