Systems and methods for using latent variable modeling for multi-modal video indexing

Invention Grant

US09542934B2 Systems and methods for using latent variable modeling for multi-modal video indexing 有权

Title translation: 用于多模态视频索引的潜变量建模的系统和方法

Please log in to see more content

Patent Title: Systems and methods for using latent variable modeling for multi-modal video indexing
Patent Title (中): 用于多模态视频索引的潜变量建模的系统和方法
Application No.: US14192861

Application Date: 2014-02-27
Publication No.: US09542934B2

Publication Date: 2017-01-10
Inventor: Matthew L. Cooper , Dhiraj Joshi , Huizhong Chen
Applicant: FUJI XEROX CO., LTD.
Applicant Address: JP Tokyo
Assignee: FUJI XEROX CO., LTD.
Current Assignee: FUJI XEROX CO., LTD.
Current Assignee Address: JP Tokyo
Agency: TransPacific Law Group
Agent Pavel I. Pogodin, Esq.
Main IPC: G10L15/05
IPC: G10L15/05 ; G06F17/30 ; G11B27/00 ; G06K9/32 ; H04N21/2343 ; H04N21/4402 ; G06K9/00 ; H04N7/088 ; G06N7/00

Systems and methods for using latent variable modeling for multi-modal video indexing

Abstract:

A computer-implemented method performed in connection with a computerized system incorporating a processing unit and a memory, the computer-implemented method involving: using the processing unit to generate a multi-modal language model for co-occurrence of spoken words and displayed text in the plurality of videos; selecting at least a portion of a first video; extracting a plurality of spoken words from the selected portion of the first video; extracting a first displayed text from the selected portion of the first video; and using the processing unit and the generated multi-modal language model to rank the extracted plurality of spoken words based on probability of occurrence conditioned on the extracted first displayed text.

Abstract(Chinese):

一种与包含处理单元和存储器的计算机化系统结合执行的计算机实现的方法，所述计算机实现的方法包括：使用所述处理单元生成用于共同出现所述口语和多个模式语言的多模式语言模型，多个视频; 选择第一视频的至少一部分; 从所述第一视频的所选部分中提取多个口语单词; 从所述第一视频的所选部分提取第一显示文本; 以及使用所述处理单元和所生成的多模式语言模型，基于所提取的第一显示文本的发生概率来对所提取的多个口语单词进行排序。

Public/Granted literature

US20150243276A1 SYSTEMS AND METHODS FOR USING LATENT VARIABLE MODELING FOR MULTI-MODAL VIDEO INDEXING Public/Granted day:2015-08-27

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/04	.分段；字极限检测
G10L15/05	..字边界检测