Theses and Dissertations
Issuing Body
Mississippi State University
Advisor
Swan II, J. Edward
Committee Member
Bridges, Susan M.
Committee Member
Hodges, Julia E.
Date of Degree
8-5-2006
Document Type
Graduate Thesis - Open Access
Major
Computer Science
Degree Name
Master of Science
College
James Worth Bagley College of Engineering
Department
Department of Computer Science and Engineering
Abstract
Information retrieval is the process of fulfilling a user?s need for information by locating items in a data collection that are similar to a complex query that is often posed in natural language. Latent Semantic Indexing (LSI) was the predominant technique employed at the National Institute of Standards and Technology?s Text Retrieval Conference for many years until limitations of its scalability to large data sets were discovered. This thesis describes SCRIBE, a modification of LSI with improved scalability. SCRIBE clusters its semantic index into discrete volumes described by high-dimensional extensions to computer graphics data structures. SCRIBE?s clustering strategy limits the number of items that must be searched and provides for sub-linear time complexity in the number of documents. Experimental results with a large, natural language document collection demonstrate that SCRIBE achieves retrieval accuracy similar to LSI but requires 1/10 the time.
URI
https://hdl.handle.net/11668/17324
Recommended Citation
Langley, Joseph R., "Scribe: A Clustering Approach To Semantic Information Retrieval" (2006). Theses and Dissertations. 3869.
https://scholarsjunction.msstate.edu/td/3869
Comments
ingular value decomposition||knowledge discovery in data