A hybrid CNN-LSTM model for speaker independent command word recognition

  • Ebun Phillip Fasina Department of Computer Sciences, University of Lagos, Akoka. Nigeria
  • Babatunde Alade Sawyerr Department of Computer Sciences, University of Lagos, Akoka. Nigeria
  • Chibuzor Nwalor Department of Computer Sciences, University of Lagos, Akoka. Nigeria
  • Ogban Ugot Department of Computer Sciences, University of Lagos, Akoka. Nigeria
Keywords: Command Word Recognition, Convolutional Neural Network, Long Short-Term Memory, Deep Learning, Recurrent Neural Network, Natural Language Processing

Abstract

Automatic speech keyword recognition is an important subset of general speech recognition. It is especially relevant in situations with limited computational resources, such as voice command recognition in low-power/low-memory device and robot interaction. This paper introduces a method for performing efficient Speaker Independent Real Time Command Word Recognition using a hybrid Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network with only 9.8K trainable parameters. CNN extracts short-term spatial features from Mel Frequency Cepstral Coefficients of command words arranged into an image-like format. LSTM learns extracted spatial features as long-term dependences. The model is trained and evaluated on the Google Speech Commands dataset on which it achieved an accuracy of 83%, a memory requirement that is 2-5% of state-of-the-art models and a faster response time when compared to off-the-shelf models.

Published
2023-06-27
How to Cite
Fasina, E. P., Sawyerr, B. A., Nwalor, C., & Ugot, O. (2023). A hybrid CNN-LSTM model for speaker independent command word recognition. Journal of Scientific Research and Development, 22(1), 115-125. Retrieved from http://jsrd.unilag.edu.ng/article/view/2363
Section
Articles