Mixing heterogeneous loss types to improve accuracy of keyword spotting
Abstract:
A method for training a neural network includes receiving a training input audio sequence including a sequence of input frames defining a hotword that initiates a wake-up process on a user device. The method further includes obtaining a first label and a second label for the training input audio sequence. The method includes generating, using a memorized neural network and the training input audio sequence, an output indicating a likelihood the training input audio sequence includes the hotword. The method further includes determining a first loss based on the first label and the output. The method includes determining a second loss based on the second label and the output. The method further includes optimizing the memorized neural network based on the first loss and the second loss associated with the training input audio sequence.
Information query
Patent Agency Ranking
0/0