Frequency warping in a speech recognition system
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a sequence representing an utterance, the sequence comprising a plurality of audio frames; determining one or more warping factors for each audio frame in the sequence using a warping neural network; applying, for each audio frame, the one or more warping factors for the audio frame to the audio frame to generate a respective modified audio frame, wherein the applying comprises using at least one of the warping factors to scale a respective frequency of the audio frame to a new respective frequency in the respective modified audio frame; and decoding the modified audio frames using a decoding neural network, wherein the decoding neural network is configured to output a word sequence that is a transcription of the utterance.
Public/Granted literature
Information query
Patent Agency Ranking
0/0