Speech synthesis using deep neural networks

Invention Grant

US08527276B1 Speech synthesis using deep neural networks 有权

Title translation: 使用深层神经网络的语音合成

Please log in to see more content

Patent Title: Speech synthesis using deep neural networks
Patent Title (中): 使用深层神经网络的语音合成
Application No.: US13660722

Application Date: 2012-10-25
Publication No.: US08527276B1

Publication Date: 2013-09-03
Inventor: Andrew William Senior , Byungha Chun , Michael Schuster
Applicant: Andrew William Senior , Byungha Chun , Michael Schuster
Applicant Address: US CA Mountain View
Assignee: Google Inc.
Current Assignee: Google Inc.
Current Assignee Address: US CA Mountain View
Agency: McDonnell Boehnen Hulbert & Berghoff LLP
Main IPC: G10L13/00
IPC: G10L13/00

Abstract:

A method and system for is disclosed for speech synthesis using deep neural networks. A neural network may be trained to map input phonetic transcriptions of training-time text strings into sequences of acoustic feature vectors, which yield predefined speech waveforms when processed by a signal generation module. The training-time text strings may correspond to written transcriptions of speech carried in the predefined speech waveforms. Subsequent to training, a run-time text string may be translated to a run-time phonetic transcription, which may include a run-time sequence of phonetic-context descriptors, each of which contains a phonetic speech unit, data indicating phonetic context, and data indicating time duration of the respective phonetic speech unit. The trained neural network may then map the run-time sequence of the phonetic-context descriptors to run-time predicted feature vectors, which may in turn be translated into synthesized speech by the signal generation module.

Abstract(Chinese):

公开了一种用于使用深层神经网络进行语音合成的方法和系统。可以训练神经网络以将训练时文本串的输入语音转录映射到声学特征向量的序列，其在由信号生成模块处理时产生预定义的语音波形。训练时间文本串可以对应于在预定义语音波形中携带的语音的书面转录。在训练之后，可以将运行时文本字符串转换为运行时语音转录，其可以包括语音上下文描述符的运行时序列，每个语音描述符包含语音单元，指示语音语境的数据，以及指示各个语音单元的持续时间的数据。经训练的神经网络然后可以将语音上下文描述符的运行时间序列映射到运行时预测特征向量，其可以由信号生成模块转换成合成语音。

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统