System and method for triphone-based unit selection for visual speech synthesis

Invention Grant

US07933772B1 System and method for triphone-based unit selection for visual speech synthesis 有权

Title translation: 用于视觉语音合成的基于耳机的单元选择的系统和方法

Please log in to see more content

Patent Title: System and method for triphone-based unit selection for visual speech synthesis
Patent Title (中): 用于视觉语音合成的基于耳机的单元选择的系统和方法
Application No.: US12051311

Application Date: 2008-03-19
Publication No.: US07933772B1

Publication Date: 2011-04-26
Inventor: Eric Cosatto , Hans Peter Graf , Fu Jie Huang
Applicant: Eric Cosatto , Hans Peter Graf , Fu Jie Huang
Applicant Address: US GA Atlanta
Assignee: AT&T Intellectual Property II, L.P.
Current Assignee: AT&T Intellectual Property II, L.P.
Current Assignee Address: US GA Atlanta
Main IPC: G10L11/00
IPC: G10L11/00 ; G06T13/00 ; G06K9/00

System and method for triphone-based unit selection for visual speech synthesis

Abstract:

A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.

Abstract(Chinese):

公开了一种用于产生具有与语音的同步的口部动作的视频序列的系统和方法。该系统利用n电话的数据库作为最小的可选单元，其中n大于1，并且优选地为3.系统使用语音距离，协调参数和目标帧来计算目标帧的每个候选n电话的目标成本言语速度对于目标序列中的每个n电话，系统根据目标成本搜索视觉上类似的候选n电话。系统对每个候选n电话进行采样，以获得与目标序列相同数量的帧，并建立候选视频帧的视频帧格点。系统为每对相邻帧分配联合成本，并通过根据目标成本和序列中的联合成本的总和的最小值找到通过网格的最优路径来搜索视频帧格以构建视频序列。

Information query

Espacenet