Manara - Qatar Research Repository
Browse

Word Error Rate Estimation for Speech Recognition: e-WER

Download (236.83 kB)
conference contribution
submitted on 2024-05-29, 09:22 and posted on 2024-05-29, 11:40 authored by Ahmed Ali, Steve Renals

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER eWER was 25.3% for the three hours test set, while the actual WER was 28.5%.

Other Information

Published in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
License: https://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.18653/v1/p18-2004

History

Language

  • English

Publisher

Association for Computational Linguistics

Publication Year

  • 2018

License statement

This Item is licensed under the Creative Commons Attribution 4.0 International License.

Institution affiliated with

  • Hamad Bin Khalifa University
  • Qatar Computing Research Institute - HBKU