-
公开(公告)号:WO02071391A3
公开(公告)日:2002-11-21
申请号:PCT/GB0200889
申请日:2002-02-28
Inventor: EPSTEIN MARK EDWARD
CPC classification number: G10L15/197 , G10L15/183
Abstract: The invention disclosed herein concerns a method of converting speech to text using a hierarchy of contextual models. The hierarchy of contextual models can be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of the plurality of contextual models. Also included can be identifying at least one of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.
Abstract translation: 这里公开的发明涉及使用上下文模型的分层结构将语音转换为文本的方法。 上下文模型的层次结构可以统计平滑到语言模型中。 该方法可以包括用多个上下文模型处理文本。 多个情境模型中的每一个可以对应于多个情境模型的层级中的节点。 还包括可以识别与文本有关的至少一个上下文模型并且用所识别的至少一个上下文模型处理随后的用户口头发言。
-
公开(公告)号:JP2000013510A
公开(公告)日:2000-01-14
申请号:JP12699
申请日:1999-01-04
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD , DIMITORI KANEVSKI , MAES STEPHANE HERMAN
IPC: G06F13/00 , H04M1/663 , H04M3/00 , H04M3/42 , H04M3/436 , H04M3/527 , H04M3/53 , H04M3/54 , H04M11/00 , H04Q3/545 , H04L12/54 , H04L12/58
Abstract: PROBLEM TO BE SOLVED: To perform automatic calling and data transfer processing, based on at least one of the recognition of a caller or an author and the time of calling or messaging by providing a switching means that processes a call according to at least one of between the identification of the caller and the subject of the cell and a programming means of a system. SOLUTION: A server 20 is programmed so as to automatically answer to an incoming telephone call, e-mail facsimile/modem, etc. When a received incoming call is a telephone call, a recording means 40 records audio data. Next, the identification of a caller is decided. More specifically, language statement and response of the caller are transmitted to a speaker recognition module 22 and are compared with a speaker model that is previously stored. If it is possible to identify it, identification is executed by using both the module 22 and an ASR/NLU module 24. Also, the ID of the caller may also be utilized for identification.
-
公开(公告)号:CA2437620C
公开(公告)日:2005-04-12
申请号:CA2437620
申请日:2002-02-28
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD
Abstract: The invention disclosed herein concerns a method of converting speech to tex t using a hierarchy of contextual models. The hierarchy of contextual models c an be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of th e plurality of contextual models. Also included can be identifying at least on e of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.
-
公开(公告)号:AU2003222602A1
公开(公告)日:2003-11-11
申请号:AU2003222602
申请日:2003-04-10
Applicant: IBM
Inventor: WARD ROBERT TODD , EPSTEIN MARK EDWARD , JONES SHARON BARBARA
Abstract: A method of developing natural language understanding (NLU) applications can include determining NLU interpretation information from an NLU training corpus of text using a multi-pass processing technique. The alteration of one pass automatically can alter an input for a subsequent pass. The NLU interpretation information can specify an interpretation of at least part of the NLU training corpus of text. The NLU interpretation information can be stored in a database, and selected items of the NLU interpretation information can be presented in a graphical editor. User specified edits also can be received in the graphical editor.
-
公开(公告)号:DE69839068T2
公开(公告)日:2009-01-22
申请号:DE69839068
申请日:1998-12-21
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD , KANEVSKY DIMITRI , MAES STEPHAN HERMAN
IPC: G06F13/00 , H04M3/436 , H04M1/663 , H04M3/00 , H04M3/42 , H04M3/527 , H04M3/53 , H04M3/54 , H04M11/00 , H04Q3/545
Abstract: A programmable automatic call and data transfer processing system which automatically processes incoming telephone calls, facsimiles and e-mails based on the identity of the caller or author, the subject matter of the message or request, and/or the time of day, which includes: a central server for automatically answering an incoming call and collecting voice data of a caller; a speaker recognition module connected to the server for identifying the caller or author; a switching module responsive to the speaker recognition module for processing the call or message in accordance with a pre-programmed procedure based on the identification of the caller or author; and a programming interface for programming the server, speaker recognizer module and the switching module. The system is programmed by the user to so as to process incoming telephone calls or e-mail and facsimile messages based on the identity of the caller or author, subject matter and content of the message and the time of day. Such processing includes, but is not limited to, switching the call to another system, forwarding the call to another telephone terminal, placing the call on hold, or disconnecting the call. In another aspect of the present invention, the system may be employed to process information retrieved from other telecommunication devices such as voice mail, facsimile/modem or e-mail. The system is capable of tagging the identity of a caller or participants to a teleconference, and transcribing the teleconferences, phone conversations and messages of such callers and participants. The system can automatically index or prioritize the received calls, messages, e-mails and facsimiles according to the caller identification or subject matter of the conversation or message, and allow the user to retrieve messages that either originated from a specific source or caller or retrieve calls which deal with similar or specific subject matter.
-
公开(公告)号:CA2437620A1
公开(公告)日:2002-09-12
申请号:CA2437620
申请日:2002-02-28
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD
Abstract: The invention disclosed herein concerns a method of converting speech to tex t using a hierarchy of contextual models. The hierarchy of contextual models c an be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of th e plurality of contextual models. Also included can be identifying at least on e of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.
-
公开(公告)号:DE69423692T2
公开(公告)日:2000-09-28
申请号:DE69423692
申请日:1994-09-08
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD , GOPALAKRISHNAN PONANI S , NAHAMOO DAVID , PICHENY MICHAEL ALAN , SEDIVY JAN
Abstract: A speech coding apparatus and method uses classification rules to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. Classification rules map each feature vector signal from a set of all possible feature vector signals to exactly one of at least two different classes of prototype vector signals. Each class contains a plurality of prototype vector signals. According to the classification rules, a first feature vector signal is mapped to a first class of prototype vector signals. The closeness of the feature value of the first feature vector signal is compared to the parameter values of only the prototype vector signals in the first class of prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal in the first class. At least the identification value of at least the prototype vector signal having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.
-
公开(公告)号:ES2227421T3
公开(公告)日:2005-04-01
申请号:ES02700489
申请日:2002-02-28
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD
Abstract: Un método para crear una jerarquía de modelos contextuales, cuyo método comprende: a) medir la distancia entre cada uno de la pluralidad de modelos contextuales utilizando una métrica de distancia, y en el que al menos uno de dicha pluralidad de modelos contextuales corresponde a una parte de un documento o una respuesta de usuario dentro de un sistema basado en el diálogo; b) identificar dos de dicha pluralidad de modelos contextuales, cuyos modelos contextuales identificados están más próximos en cuanto a distancia que los otros de dicha pluralidad de modelos contextuales; c) unir dichos modelos contextuales identificados en un modelo contextual generador; d) repetir las operaciones a), b), y c) hasta crear una jerarquía de dicha pluralidad de modelos contextuales, cuya jerarquía tiene un nodo de raíz; y e) alisar estadísticamente dicha jerarquía de dicha pluralidad de modelos contextuales, con lo que resulta un modelo de lenguaje de reconocimiento de habla.
-
公开(公告)号:SG43733A1
公开(公告)日:1997-11-14
申请号:SG1996000324
申请日:1994-09-08
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD , GOPALAKISHNAN PONANI S , NAHAMOO DAVID , PICHENY MICHAEL ALAN , SEDIVY JAN
IPC: G10L15/02 , G10L19/00 , G10L19/02 , H03M7/30 , H04B14/04 , G10L5/06 , G10L3/00 , G10L5/00 , G10L7/08 , G10L9/06 , G10L9/18
Abstract: A speech coding apparatus and method uses classification rules to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. Classification rules map each feature vector signal from a set of all possible feature vector signals to exactly one of at least two different classes of prototype vector signals. Each class contains a plurality of prototype vector signals. According to the classification rules, a first feature vector signal is mapped to a first class of prototype vector signals. The closeness of the feature value of the first feature vector signal is compared to the parameter values of only the prototype vector signals in the first class of prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal in the first class. At least the identification value of at least the prototype vector signal having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.
-
公开(公告)号:DE69839068D1
公开(公告)日:2008-03-20
申请号:DE69839068
申请日:1998-12-21
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD , KANEVSKY DIMITRI , MAES STEPHAN HERMAN
IPC: G06F13/00 , H04M3/436 , H04M1/663 , H04M3/00 , H04M3/42 , H04M3/527 , H04M3/53 , H04M3/54 , H04M11/00 , H04Q3/545
Abstract: A programmable automatic call and data transfer processing system which automatically processes incoming telephone calls, facsimiles and e-mails based on the identity of the caller or author, the subject matter of the message or request, and/or the time of day, which includes: a central server for automatically answering an incoming call and collecting voice data of a caller; a speaker recognition module connected to the server for identifying the caller or author; a switching module responsive to the speaker recognition module for processing the call or message in accordance with a pre-programmed procedure based on the identification of the caller or author; and a programming interface for programming the server, speaker recognizer module and the switching module. The system is programmed by the user to so as to process incoming telephone calls or e-mail and facsimile messages based on the identity of the caller or author, subject matter and content of the message and the time of day. Such processing includes, but is not limited to, switching the call to another system, forwarding the call to another telephone terminal, placing the call on hold, or disconnecting the call. In another aspect of the present invention, the system may be employed to process information retrieved from other telecommunication devices such as voice mail, facsimile/modem or e-mail. The system is capable of tagging the identity of a caller or participants to a teleconference, and transcribing the teleconferences, phone conversations and messages of such callers and participants. The system can automatically index or prioritize the received calls, messages, e-mails and facsimiles according to the caller identification or subject matter of the conversation or message, and allow the user to retrieve messages that either originated from a specific source or caller or retrieve calls which deal with similar or specific subject matter.
-
-
-
-
-
-
-
-
-