<임성균>
Multitask learning of deep neural network-based keyword spotting for IoT devices
저자: Seong-Gyun Leem, In-Chul Yoo, and Dongsuk Yook.
저널명: IEEE Transactions on Consumer Electronics
출판연도: 2019
<Abstract>
Speech-based interfaces are convenient and intuitive, and therefore, strongly preferred by Internet of Things (IoT) devices for human–computer interaction. Pre-defined keywords are typically used as a trigger to notify devices for inputting the subsequent voice commands. Keyword spotting techniques used as voice trigger mechanisms, typically model the target keyword via triphone models and non-keywords through single state filler models. Recently, deep neural networks (DNNs) have
shown better performance compared to hidden Markov models with Gaussian mixture models, in various tasks including speech recognition. However, conventional DNN-based keyword spotting methods cannot change the target keywords easily, which is an essential feature for speech-based IoT device interface. Additionally, the increase in computational requirements interferes with the use of complex filler models in DNN-based keyword spotting systems, which diminishes the accuracy of such systems. In this paper, we propose a novel DNN-based keyword spotting system that alters the keyword on the fly and utilizes triphone and monophone acoustic models in an effort to reduce computational complexity and increase generalization performance. The experimental results using the FFMTIMIT corpus show that the error rate of the proposed method was reduced by 36.6%.
설명: 음성 기반 인터페이스를 실행하기 위해서는 wake-up command (Okay google, Hi bixby) 인식이 필요합니다. 현재 대부분의 시스템은 고정된 wake-up command 만을 사용하지만 이 논문에서는 심층 신경망(Deep neural network), 은닉 마르코프 모델(Hidden Markov model), Multitask learning을 기반으로 음성 인식 정확도와 추론 속도를 보장하면서 사용자들이 자유롭게 wake-up command를 바꿀 수 있는 시스템을 연구하였습니다.
