Advisor

Picone, Joseph

Committee Member

Lazarou, Georgios Y.

Committee Member

Younan, Nicolas H.

Committee Member

Johnson, Corlis

Committee Member

Jonkman, Jeff

Date of Degree

1-1-2003

Document Type

Graduate Thesis - Open Access

Major

Electrical Engineering

Degree Name

Master of Science

College

College of Engineering

Department

Department of Electrical and Computer Engineering

Abstract

Over the past few years, speech recognition technology performance on tasks ranging from isolated digit recognition to conversational speech has dramatically improved. Performance on limited recognition tasks in noiseree environments is comparable to that achieved by human transcribers. This advancement in automatic speech recognition technology along with an increase in the compute power of mobile devices, standardization of communication protocols, and the explosion in the popularity of the mobile devices, has created an interest in flexible voice interfaces for mobile devices. However, speech recognition performance degrades dramatically in mobile environments which are inherently noisy. In the recent past, a great amount of effort has been spent on the development of front ends based on advanced noise robust approaches. The primary objective of this thesis was to analyze the performance of two advanced front ends, referred to as the QIO and MFA front ends, on a speech recognition task based on the Wall Street Journal database. Though the advanced front ends are shown to achieve a significant improvement over an industry-standard baseline front end, this improvement is not operationally significant. Further, we show that the results of this evaluation were not significantly impacted by suboptimal recognition system parameter settings. Without any front end-specific tuning, the MFA front end outperforms the QIO front end by 9.6% relative. With tuning, the relative performance gap increases to 15.8%. Finally, we also show that mismatched microphone and additive noise evaluation conditions resulted in a significant degradation in performance for both front ends.

URI

https://hdl.handle.net/11668/19116

Comments

voice recognition||noise robust algorithms||speech recognition||aurora evaluations||front ends

Share

COinS