Theses and Dissertations

Title

WebDoc an Automated Web Document Indexing System

Author

Bo Tang

Issuing Body

Mississippi State University

Advisor

Hodges, Julia

Date of Degree

1-1-2002

Document Type

Graduate Thesis - Open Access

Degree Name

Master of Science

College

James Worth Bagley College of Engineering

Department

Department of Computer Science

Abstract

This thesis describes WebDoc, an automated system that classifies Web documents according to the Library of Congress classification system. This work is an extension of an early version of the system that successfully generated indexes for journal articles. The unique features of Web documents, as well as how they will affect the design of a classification system, are discussed. We argue that full-text analysis of Web documents is inevitable, and contextual information must be used to assist the classification. The architecture of the WebDoc system is presented. We performed experiments on it with and without the assistance of contextual information. The results show that contextual information improved the system?s performance significantly.

URI

https://hdl.handle.net/11668/21259

This document is currently not available here.

Share

COinS