Invention Grant
- Patent Title: Systems and methods for neural text-to-speech using convolutional sequence learning
-
Application No.: US16058265Application Date: 2018-08-08
-
Publication No.: US10796686B2Publication Date: 2020-10-06
- Inventor: Sercan O. Arik , Wei Ping , Kainan Peng , Sharan Narang , Ajay Kannan , Andrew Gibiansky , Jonathan Raiman , John Miller
- Applicant: Baidu USA, LLC
- Applicant Address: US CA Sunnyvale
- Assignee: Baidu USA LLC
- Current Assignee: Baidu USA LLC
- Current Assignee Address: US CA Sunnyvale
- Agency: North Weber & Baugh LLP
- Main IPC: G10L13/027
- IPC: G10L13/027 ; G10L13/08 ; G10L13/047

Abstract:
Described herein are embodiments of a fully-convolutional attention-based neural text-to-speech (TTS) system, which various embodiments may generally be referred to as Deep Voice 3. Embodiments of Deep Voice 3 match state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. Deep Voice 3 embodiments were scaled to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, common error modes of attention-based speech synthesis networks were identified and mitigated, and several different waveform synthesis methods were compared. Also presented are embodiments that describe how to scale inference to ten million queries per day on one single-GPU server.
Public/Granted literature
- US20190122651A1 SYSTEMS AND METHODS FOR NEURAL TEXT-TO-SPEECH USING CONVOLUTIONAL SEQUENCE LEARNING Public/Granted day:2019-04-25
Information query