Systems and methods for neural voice cloning with a few samples

Invention Grant

US11238843B2 Systems and methods for neural voice cloning with a few samples 有权

Please log in to see more content

Patent Title: Systems and methods for neural voice cloning with a few samples
Application No.: US16143330

Application Date: 2018-09-26
Publication No.: US11238843B2

Publication Date: 2022-02-01
Inventor: Sercan O. Arik , Jitong Chen , Kainan Peng , Wei Ping , Yanqi Zhou
Applicant: Baidu USA, LLC
Applicant Address: US CA Sunnyvale
Assignee: Baidu USA, LLC
Current Assignee: Baidu USA, LLC
Current Assignee Address: US CA Sunnyvale
Agency: North Weber & Baugh LLP
Main IPC: G10L13/00
IPC: G10L13/00 ; G10L13/047 ; G10L13/027 ; G10L13/08

Systems and methods for neural voice cloning with a few samples

Abstract:

Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.

Public/Granted literature

US20190251952A1 SYSTEMS AND METHODS FOR NEURAL VOICE CLONING WITH A FEW SAMPLES Public/Granted day:2019-08-15

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统