Method and system for aligning natural and synthetic video to speech synthesis

Invention Grant

US07844463B2 Method and system for aligning natural and synthetic video to speech synthesis 有权

Title translation: 将自然和合成视频与语音合成对齐的方法和系统

Please log in to see more content

Patent Title: Method and system for aligning natural and synthetic video to speech synthesis
Patent Title (中): 将自然和合成视频与语音合成对齐的方法和系统
Application No.: US12193397

Application Date: 2008-08-18
Publication No.: US07844463B2

Publication Date: 2010-11-30
Inventor: Andrea Basso , Mark Charles Beutnagel , Joern Ostermann
Applicant: Andrea Basso , Mark Charles Beutnagel , Joern Ostermann
Applicant Address: US NY New York
Assignee: AT&T Intellectual Property II, L.P.
Current Assignee: AT&T Intellectual Property II, L.P.
Current Assignee Address: US NY New York
Main IPC: G10L13/00
IPC: G10L13/00 ; G06T13/00

Method and system for aligning natural and synthetic video to speech synthesis

Abstract:

According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously—text and Facial Animation Parameters. A Text-To-Speech converter drives the mouth shapes of the face. An encoder sends Facial Animation Parameters to the face. The text input can include codes, or bookmarks, transmitted to the Text-to-Speech converter, which are placed between and inside words. The bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. The Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp and a real-time time stamp. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.

Abstract(Chinese):

根据MPEG-4的TTS架构，面部动画可以同时由两个流驱动 - 文本和面部动画参数。文字转语音转换器驱动脸部的嘴形。编码器将面部动画参数发送到脸部。文本输入可以包括发送到文本到语音转换器的代码或书签，其被放置在内部和内部的单词之间。书签带有编码器时间戳。由于文本到语音转换的性质，编码器时间戳与实际时间无关，应被解释为计数器。面部动画参数流携带与文本书签相同的编码器时间戳。系统读取书签并提供编码器时间戳和实时时间戳。面部动画系统使用书签的编码器时间戳作为参考，将正确的面部动画参数与实时时间戳相关联。

Public/Granted literature

US20080312930A1 METHOD AND SYSTEM FOR ALIGNING NATURAL AND SYNTHETIC VIDEO TO SPEECH SYNTHESIS Public/Granted day:2008-12-18

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统