Lazarou, Georgios Y.
Younan, Nicholas H.
Date of Degree
Graduate Thesis - Open Access
Supervised learning using Hidden Markov Models has been used to train acoustic models for automatic speech recognition for several years. Typically clean transcriptions form the basis for this training regimen. However, results have shown that using sources of readily available transcriptions, which can be erroneous at times (e.g., closed captions) do not degrade the performance significantly. This work analyzes the effects of mislabeled data on recognition accuracy. For this purpose, the training is performed using manually corrupted training data and the results are observed on three different databases: TIDigits, Alphadigits and SwitchBoard. For Alphadigits, with 16% of data mislabeled, the performance of the system degrades by 12% relative to the baseline results. For a complex task like SWITCHBOARD, at 16% mislabeled training data, the performance of the system degrades by 8.5% relative to the baseline results. The training process is more robust to mislabeled data because the Gaussian mixtures that are used to model the underlying distribution tend to cluster around the majority of the correct data. The outliers (incorrect data) do not contribute significantly to the reestimation process.
Sundaram, Ramasubramanian H, "Effects of Transcription Errors on Supervised Learning in Speech Recognition" (2003). Theses and Dissertations MSU. 1815.