System and method for cross-speaker style transfer in text-to-speech and training data generation

Invention Grant

US11361753B2 System and method for cross-speaker style transfer in text-to-speech and training data generation 有权

Please log in to see more content

Patent Title: System and method for cross-speaker style transfer in text-to-speech and training data generation
Application No.: US17030871

Application Date: 2020-09-24
Publication No.: US11361753B2

Publication Date: 2022-06-14
Inventor: Shifeng Pan , Lei He , Yulin Li , Sheng Zhao , Chunling Ma
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Applicant Address: US WA Redmond
Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
Current Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
Current Assignee Address: US WA Redmond
Agency: Workman Nydegger
Priority: CN202010885556.9 20200828
Main IPC: G10L13/10
IPC: G10L13/10 ; G10L15/06 ; G10L21/013 ; G10L25/18 ; G10L25/63 ; G10L15/18 ; G10L25/30 ; G10L15/187

System and method for cross-speaker style transfer in text-to-speech and training data generation

Abstract:

Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize/train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.

Public/Granted literature

US20220068259A1 SYSTEM AND METHOD FOR CROSS-SPEAKER STYLE TRANSFER IN TEXT-TO-SPEECH AND TRAINING DATA GENERATION Public/Granted day:2022-03-03

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/08	.文本分析或文本以外的语音合成参数的产生，例如语义图翻译为音素、韵律产生、重音或声调测定
G10L13/10	..来自文本的韵律规则；重音或声调