Speech recognition using unspoken text and speech synthesis

Invention Grant

US11605368B2 Speech recognition using unspoken text and speech synthesis 有权

Please log in to see more content

Patent Title: Speech recognition using unspoken text and speech synthesis
Application No.: US17454536

Application Date: 2021-11-11
Publication No.: US11605368B2

Publication Date: 2023-03-14
Inventor: Zhehuai Chen , Andrew M. Rosenberg , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger; Grant Griffith
Main IPC: G10L15/16
IPC: G10L15/16 ; G10L13/00 ; G10L13/08 ; G10L15/06

Speech recognition using unspoken text and speech synthesis

Abstract:

A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

Public/Granted literature

US20220068255A1 Speech Recognition Using Unspoken Text and Speech Synthesis Public/Granted day:2022-03-03

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络