DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters

authored by
A.M. Castro Martinez, Lukas Gerlach, Guillermo Payá-Vayá, Hynek Hermansky, Jasper Ooster, Bernd T. Meyer
Abstract

In several applications of machine listening, predicting how well an automatic speech recognition system will perform before the actual decoding enables the system to adapt to unseen acoustic characteristics dynamically. Feedback about speech quality, for instance, could allow modern hearing aids to select a speech source in complex acoustic scenes with the aim of enhancing the speech intelligibility of a target speaker. In this study, we look at different performance measures to estimate the word error rates of simulated behind-the-ear hearing aid signals and detect the azimuth angle of the target source in 180-degree spatial scenes. These measures derive from phoneme posterior probabilities produced by a deep neural network acoustic model. However, the more complex the model is, the more computationally expensive it becomes to obtain these measures; therefore, we assess how the model size affects prediction performance. Our findings suggest measures derived from smaller nets are suitable to predict error rates of more complex models reliably enough to be implemented in hearing aid hardware.

Organisation(s)
Architectures and Systems Section
External Organisation(s)
Cluster of Excellence Hearing4all
Carl von Ossietzky University of Oldenburg
Johns Hopkins University
Type
Article
Journal
Speech communication
Volume
106
Pages
44-56
No. of pages
13
ISSN
0167-6393
Publication date
01.2019
Publication status
Published
Peer reviewed
Yes
ASJC Scopus subject areas
Software, Modelling and Simulation, Communication, Language and Linguistics, Linguistics and Language, Computer Vision and Pattern Recognition, Computer Science Applications
Sustainable Development Goals
SDG 3 - Good Health and Well-being
Electronic version(s)
https://doi.org/10.1016/j.specom.2018.11.006 (Access: Closed)