Systems and methods for neural text-to-speech using convolutional sequence learning

Invention Grant

US10796686B2 Systems and methods for neural text-to-speech using convolutional sequence learning 有权

Please log in to see more content

Patent Title: Systems and methods for neural text-to-speech using convolutional sequence learning
Application No.: US16058265

Application Date: 2018-08-08
Publication No.: US10796686B2

Publication Date: 2020-10-06
Inventor: Sercan O. Arik , Wei Ping , Kainan Peng , Sharan Narang , Ajay Kannan , Andrew Gibiansky , Jonathan Raiman , John Miller
Applicant: Baidu USA, LLC
Applicant Address: US CA Sunnyvale
Assignee: Baidu USA LLC
Current Assignee: Baidu USA LLC
Current Assignee Address: US CA Sunnyvale
Agency: North Weber & Baugh LLP
Main IPC: G10L13/027
IPC: G10L13/027 ; G10L13/08 ; G10L13/047

Systems and methods for neural text-to-speech using convolutional sequence learning

Abstract:

Described herein are embodiments of a fully-convolutional attention-based neural text-to-speech (TTS) system, which various embodiments may generally be referred to as Deep Voice 3. Embodiments of Deep Voice 3 match state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. Deep Voice 3 embodiments were scaled to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, common error modes of attention-based speech synthesis networks were identified and mitigated, and several different waveform synthesis methods were compared. Also presented are embodiments that describe how to scale inference to ten million queries per day on one single-GPU server.

Public/Granted literature

US20190122651A1 SYSTEMS AND METHODS FOR NEURAL TEXT-TO-SPEECH USING CONVOLUTIONAL SEQUENCE LEARNING Public/Granted day:2019-04-25

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/02	.产生合成语音的方法；语音合成设备
G10L13/027	..概念－语音合成；从基于机器的概念产生自然词语（产生文本以外的语音合成参数的入G10L13/08）