A deep learning method for voice word spotting in Persian language

Document Type : Original Article

Author

Amin university

10.22034/jasp.2023.55549.1219

Abstract

Abstract: With the development of technologies related to audio data recording and transmission, as well as the advancement of artificial intelligence science, the analysis of human speech by intelligent machines has grown greatly. One of the most important technologies in speech processing in the last decade has been word search in audio. By receiving keywords from the user, the word search system can check the presence or absence of that word in the audio file and report the result to the user. Due to the insufficient data set, it is difficult to develop an ideal software that can find all the user's words exist in the audio. In this research, by collecting 42 hours of Persian audio data along with other data available in this field and relying on deep neural network algorithms were designed, trained and evaluated. These two architectures are complementary and can increase the accuracy of word search. First architecture is designed in such a way that its goal is to produce the phoneme corresponding to the sound, without having the information of the language model, but the second architecture with multiple layers and having a language model tries to produce a correct text according to the domain of the Persian language. The presented combined proposed method can detect the keywords in the audio with 88.01% accuracy and determine the absence of words with 99.80% accuracy and has higher accuracy than similar methods.

Keywords

Main Subjects