Invention Grant
- Patent Title: Two-level speech prosody transfer
-
Application No.: US18054604Application Date: 2022-11-11
-
Publication No.: US12327544B2Publication Date: 2025-06-10
- Inventor: Lev Finkelstein , Chun-an Chan , Byungha Chun , Ye Jia , Yu Zhang , Robert Andrew James Clark , Vincent Wan
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Agency: Honigman LLP
- Agent Brett A. Krueger; Grant Griffith
- Main IPC: G10L13/10
- IPC: G10L13/10 ; G10L13/02 ; G10L17/18

Abstract:
A method includes receiving an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice and generating, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation for the input text utterance. The intermediate synthesized speech representation possesses the intended prosody. The method also includes providing the intermediate synthesized speech representation to a second TTS model that includes an encoder portion and a decoder portion. The encoder portion is configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody. The decoder portion is configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech that has the intended prosody specified by the utterance embedding and speaker characteristics of the target voice.
Public/Granted literature
- US20230064749A1 Two-Level Speech Prosody Transfer Public/Granted day:2023-03-02
Information query