-
公开(公告)号:WO2007056032A1
公开(公告)日:2007-05-18
申请号:PCT/US2006/042733
申请日:2006-10-31
Applicant: MICROSOFT CORPORATION
Inventor: ACERO, Alejandro , CHELBA, Ciprian I. , SANCHEZ, Jorge Silva F.
CPC classification number: G06F17/30778 , G06F17/30746 , G06F17/30749 , G10L15/197
Abstract: An index for searching spoken documents having speech data and text meta-data is created by obtaining probabilities of occurrence of words and positional information of the words of the speech data and combining it with at least positional information of the words in the text meta-data. A single index can be created because the speech data and the text meta-data are treated the same and considered only different categories .
Abstract translation: 用于搜索具有语音数据和文本元数据的口头文档的索引是通过获得词语出现的概率和语音数据的单词的位置信息并将其与文本元数据中的单词的至少位置信息 。 可以创建单个索引,因为语音数据和文本元数据被视为相同,仅被认为是不同的类别。
-
公开(公告)号:WO2007056029A1
公开(公告)日:2007-05-18
申请号:PCT/US2006/042723
申请日:2006-10-31
Applicant: MICROSOFT CORPORATION
Inventor: ACERO, Alejandro , CHELBA, Ciprian I. , SANCHEZ, Jorge Silva F.
CPC classification number: G06F17/30778 , G06F17/30746 , G10L15/197
Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.
Abstract translation: 通过识别用于语音段的至少两个备选词序列来索引语音片段。 对于替代序列中的每个单词,信息被放置在索引中的单词的条目中。 基于词出现在语音片段中的概率与阈值的比较,从索引中的条目中消除语音单元。
-
公开(公告)号:EP1952270A1
公开(公告)日:2008-08-06
申请号:EP06827328.3
申请日:2006-10-31
Applicant: Microsoft Corporation
Inventor: ACERO, Alejandro , CHELBA, Ciprian I. , SANCHEZ, Jorge Silva F.
CPC classification number: G06F17/30778 , G06F17/30746 , G06F17/30749 , G10L15/197
Abstract: An index for searching spoken documents having speech data and text meta-data is created by obtaining probabilities of occurrence of words and positional information of the words of the speech data and combining it with at least positional information of the words in the text meta-data. A single index can be created because the speech data and the text meta-data are treated the same and considered only different categories .
-
公开(公告)号:EP1949260A1
公开(公告)日:2008-07-30
申请号:EP06836786.1
申请日:2006-10-31
Applicant: Microsoft Corporation
Inventor: ACERO, Alejandro , CHELBA, Ciprian I. , SANCHEZ, Jorge Silva F.
CPC classification number: G06F17/30778 , G06F17/30746 , G10L15/197
Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.
-
-
-