-
公开(公告)号:CA1262188A
公开(公告)日:1989-10-03
申请号:CA528790
申请日:1987-02-02
Applicant: IBM
Inventor: BAHL LALIT R , BROWN PETER F , DESOUZA PETER V , MERCER ROBERT L
Abstract: IMPROVING THE TRAINING OF MARKOV MODELS USED IN A SPEECH RECOGNITION SYSTEM In a word, or speech, recognition system for decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state model and wherein each probabilistic model has transition probability items and output probability items and wherein a value is stored for each of at least some probability items, the present invention relates to apparatus and method for determining probability values for probability items by biassing at least some of the stored values to enhance the likelihood that outputs generated in response to communication of a known word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word. Specifically, the current values of counts --from which probability items are derived-- are adjusted by uttering a known word and determining how often probability events occur relative to (a) the model corresponding to the known uttered "correct" word and (b) the model of at least one other "incorrect" word. The current count values are increased based on the event occurrences relating co the correct word and are reduced based on the event occurrences relating to the incorrect word or words.
-
公开(公告)号:CA1259411A
公开(公告)日:1989-09-12
申请号:CA504807
申请日:1986-03-24
Applicant: IBM
Inventor: BAHL LALIT R , DESOUZA PETER V , MERCER ROBERT L , PICHENY MICHAEL A
IPC: G10L9/02
Abstract: SPEECH RECOGNITION EMPLOYING A SET OF MARKOV MODELS THAT INCLUDES MARKOV MODELS REPRESENTING TRANSITIONS TO AND FROM SILENCE The present invention relates to apparatus and method for constructing word baseforms which can be matched against a string of generated acoustic labels which includes: forming a set of phonetic phone machines, wherein each phone machine has (i) a plurality of states, (ii) a plurality of transitions each of which extends from a state to a state, (iii) a stored probability for each transition, and (iv) stored label output probabilities, each label output probability corresponding to the probability of said each phone machine producing a corresponding label; wherein said set of phonetic machines is formed to include a subset of onset phone machines, the stored probabilities of each onset phone machine corresponding to at least one phonetic element being uttered at the beginning of a speech segment; and wherein said set of phonetic machines is formed to include a subset of trailing phone machines, the stored probabilities of each trailing phone machine corresponding to at least one single phonetic element being uttered at the end of a speech segment. Word baseforms are constructed by concatenating phone machines selected from the set.
-
公开(公告)号:CA1236577A
公开(公告)日:1988-05-10
申请号:CA494697
申请日:1985-11-06
Applicant: IBM
Inventor: BAHL LALIT R , MERCER ROBERT L , DEGENNARO STEVEN V
IPC: G10L5/00
Abstract: The invention herein provides, in a speech recognition system which represents each vocabulary word or a portion thereof by at least one sequence of phones wherein each phone corresponds to a respective phone machine, each phone machine having associated therewith (a) at least one transition and (b) actual label output probabilities, each actual label probability representing the probability that a subject label is generated at a given transition in the phone machine, a method of performing an acoustic match between phones and a string of labels produced by an acoustic processor in response to a speech input, the method comprising the steps of: forming simplified phone machines which includes the step of replacing by a single specific value the actual label probabilities for a given label at all transitions at which the given label may be generated in a particular phone machine; and determining the probability of a phone generating the labels in the string based on the simplified phone machine corresponding thereto.
-
公开(公告)号:CA2125200A1
公开(公告)日:1995-04-29
申请号:CA2125200
申请日:1994-06-06
Applicant: IBM
Inventor: BERGER ADAM L , BROWN PETER F , DELLA PIETRA STEPHEN A , DELLA PIETRA VINCENT J , KEHLER ANDREW S , MERCER ROBERT L
-
公开(公告)号:CA1241751A
公开(公告)日:1988-09-06
申请号:CA504801
申请日:1986-03-24
Applicant: IBM
Inventor: BAHL LALIT R , DESOUZA PETER V , MERCER ROBERT L , PICHENY MICHAEL A
Abstract: The present invention addresses the problem of constructing fenemic baseforms which take into account variations in pronunciation of words from one utterance thereof to another. Specifically, the invention relates to a method of constructing a fenemic baseform for a word in a vocabulary of word segments including the steps of: (a) transforming multiple utterances of the word into respective strings of fenemes; (b) defining a set of fenemic Markov model phone machines; (c) determining the best single phone machine P1 for producing the multiple feneme strings; (d) determining the best two phone baseform of the form P1P2 or P2P1 for producing the multiple feneme strings; (e) aligning the best two phone baseform against each feneme string; (f) splitting each feneme string into a left portion and a right portion with the left portion corresponding to the first phone machine of the two phone baseform and the right portion corresponding to the second phone machine of the two phone baseform; (g) identifying each left portion as a left substring and each right portion as a right substring; (h) processing the set of left substrings and the set of right substrings in the same manner as the set of feneme strings corresponding to the multiple utterances including the further step of inhibiting further splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform; and (k) concatenating the unsplit single phones in an order corresponding to the order of the feneme substrings to which they correspond.
-
公开(公告)号:CA2091912C
公开(公告)日:1996-12-03
申请号:CA2091912
申请日:1993-03-18
Applicant: IBM
Inventor: BROWN PETER F , DELLA PIETRA STEPHEN A , DELLA PIETRA VINCENT J , JELINEK FREDERICK , MERCER ROBERT L
Abstract: A speech recognition system displays a source text of one or more words in a source language. The system has an acoustic processor for generating a sequence of coded representations of an utterance to be recognized. The utterance comprises a series of one or more words in a target language different from the source language. A set of one or more speech hypotheses, each comprising one or more words from the target language, are produced. Each speech hypothesis is modeled with an acoustic model. An acoustic match score for each speech hypothesis comprises an estimate of the closeness of a match between the acoustic model of the speech hypothesis and the sequence of coded representations of the utterance. A translation match score for each speech hypothesis comprises an estimate of the probability of occurrence of the speech hypothesis given the occurrence of the source text. A hypothesis score for each hypothesis comprises a combination of the acoustic match score and the translation match score. At least one word of one or more speech hypotheses having the best hypothesis scores is output as a recognition result.
-
17.
公开(公告)号:CA1248633A
公开(公告)日:1989-01-10
申请号:CA504800
申请日:1986-03-24
Applicant: IBM
Inventor: BAHL LALIT R , JELINEK FREDERICK , MERCER ROBERT L
Abstract: APPARATUS AND METHOD FOR DETERMINING A LIKELY WORD SEQUENCE FROM LABELS GENERATED BY AN AN ACOUSTIC PROCESSOR The present invention addresses the problem of determining, in a speech recognition context, a likely sequence or path of words from a plurality of word paths given a string of labels that are generated at successive intervals. The invention features multiple stack decoding and a unique strategy for extending one word path at a time without undue reliance on word path length. With multiple stack decoding, a stack is associated with each label of the label string. Word paths that most likely end at a given label are assigned to the stack corresponding to the given label and are ordered according to likelihood at the given label. The strategy of deciding which word path to extend includes the forming of a likelihood envelope against which the word paths are compared to determine if a word path is sufficiently likely to be extended. From among the word paths that are found to be extendible, the word path of highest likelihood in the earliest stack --i.e. the shortest most likely word path-- is selected for extension. After a word path is extended, it is deleted from its stack and the word paths extended therefrom are entered into appropriate stacks.
-
18.
公开(公告)号:CA1246229A
公开(公告)日:1988-12-06
申请号:CA504806
申请日:1986-03-24
Applicant: IBM
Inventor: BAHL LALIT R , DESOUZA PETER V , MERCER ROBERT L
IPC: G06F1/00
Abstract: APPARATUS AND METHOD FOR PRODUCING A LIST OF LIKELY CANDIDATE WORDS CORRESPONDING TO A SPOKEN INPUT A speech recognition apparatus and method of selecting likely word from a vocabulary of words, wherein each word is represented by a sequence of at least one probabilistic finite state phone machine and wherein an acoustic processor generates acoustic labels in response to a spoken input, include: (a) forming a first table in which each label in the alphabet provides a vote for each word in the vocabulary, each label vote for a subject word indicating the likelihood of the subject word producing the label providing the vote; (b) forming a second table in which each label is assigned a penalty for each word in the vocabulary, the penalty assigned to a given label for a given word being indicative of the likelihood of the given label not being produced according to the model for the given word; and (c) for a given string of labels, determining the likelihood of a particular word which includes the step of combining the votes of all labels in the string for the particular word together with the penalties of all labels not in the string for the particular word.
-
公开(公告)号:CA1236578A
公开(公告)日:1988-05-10
申请号:CA496161
申请日:1985-11-26
Applicant: IBM
Inventor: BAHL LALIT R , DESOUZA PETER V , MERCER ROBERT L , PICHENY MICHAEL A
IPC: G10L5/00
Abstract: FENEME-BASED MARKOV MODELS FOR WORDS In a speech recognition system, apparatus and method for modelling words with label-based Markov models is disclosed. The modelling includes: entering a first speech input, corresponding to words in a vocabulary, into an acoustic processor which converts each spoken word into a sequence of standard labels, where each standard label corresponds to a sound type assignable to an interval of time; representing each standard label as a probabilistic model which has a plurality of states, at least one transition from a state to a state, and at least one settable output probability at some transitions; entering selected acoustic inputs into an acoustic processor which converts the selected acoustic inputs into personalized labels, each personalized label corresponding to a sound type assigned to an interval of time; and setting each output probability as the probability or the standard label represented by a given model producing a particular personalized label at a given transition in the given model. The present invention addresses the problem of generating models of words simply and automatically in a speech recognition system.
-
-
-
-
-
-
-
-