Invention Grant
- Patent Title: Systems and methods for neural voice cloning with a few samples
-
Application No.: US16143330Application Date: 2018-09-26
-
Publication No.: US11238843B2Publication Date: 2022-02-01
- Inventor: Sercan O. Arik , Jitong Chen , Kainan Peng , Wei Ping , Yanqi Zhou
- Applicant: Baidu USA, LLC
- Applicant Address: US CA Sunnyvale
- Assignee: Baidu USA, LLC
- Current Assignee: Baidu USA, LLC
- Current Assignee Address: US CA Sunnyvale
- Agency: North Weber & Baugh LLP
- Main IPC: G10L13/00
- IPC: G10L13/00 ; G10L13/047 ; G10L13/027 ; G10L13/08

Abstract:
Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
Public/Granted literature
- US20190251952A1 SYSTEMS AND METHODS FOR NEURAL VOICE CLONING WITH A FEW SAMPLES Public/Granted day:2019-08-15
Information query