Two-level speech prosody transfer

Invention Grant

US12327544B2 Two-level speech prosody transfer 有权

Please log in to see more content

Patent Title: Two-level speech prosody transfer
Application No.: US18054604

Application Date: 2022-11-11
Publication No.: US12327544B2

Publication Date: 2025-06-10
Inventor: Lev Finkelstein , Chun-an Chan , Byungha Chun , Ye Jia , Yu Zhang , Robert Andrew James Clark , Vincent Wan
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger; Grant Griffith
Main IPC: G10L13/10
IPC: G10L13/10 ; G10L13/02 ; G10L17/18

Abstract:

A method includes receiving an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice and generating, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation for the input text utterance. The intermediate synthesized speech representation possesses the intended prosody. The method also includes providing the intermediate synthesized speech representation to a second TTS model that includes an encoder portion and a decoder portion. The encoder portion is configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody. The decoder portion is configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech that has the intended prosody specified by the utterance embedding and speaker characteristics of the target voice.

Public/Granted literature

US20230064749A1 Two-Level Speech Prosody Transfer Public/Granted day:2023-03-02

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/08	.文本分析或文本以外的语音合成参数的产生，例如语义图翻译为音素、韵律产生、重音或声调测定
G10L13/10	..来自文本的韵律规则；重音或声调