-
公开(公告)号:DE69010941T2
公开(公告)日:1995-03-16
申请号:DE69010941
申请日:1990-02-27
Applicant: IBM
Inventor: BAHL LALIT RAI , BROWN PETER FITZHUGH , DESOUZA PETER VINCENT , MERCER ROBERT LEROY
Abstract: A continuous speech recognition system includes an automatic phonological rules generator which determines variations in the pronunciation of phonemes based on the context in which they occur. This phonological rules generator associates sequences of labels derived from vocalizations of a training text with respective phonemes inferred from the training text. These sequences are then annotated with their phoneme context from the training text and clustered into groups representing similar pronunciations of each phoneme. A decision tree is generated using the context information of the sequences to predict the clusters to which the sequences belong. The training data is processed by the decision tree to divide the sequences into leaf-groups representing similar pronunciations of each phoneme. The sequences in each leaf-group are clustered into sub-groups representing respectively different pronunciations of their corresponding phoneme in a give context. A Markov model is generated for each sub-group. The various Markov models of a leaf-group are combined into a single compound model by assigning common initial and final states to each model. The compound Markov models are used by a speech recognition system to analyze an unknown sequence of labels given its context.
-
公开(公告)号:DE3852608D1
公开(公告)日:1995-02-09
申请号:DE3852608
申请日:1988-10-19
Applicant: IBM
Inventor: BAHL LALIT RAI , BROWN PETER FITZHUGH , DESOUZA PETER VINCENT , MERCER ROBERT LEROY
Abstract: In order to determine a next event based upon available data, a binary decision tree is constructed having true or false questions at each node and a probability distribution of the unknown next event based upon available data at each leaf. Starting at the root of the tree, the construction process proceeds from node-to-node towards a leaf by answering the question at each node encountered and following either the true or false path depending upon the answer. The questions are phrased in terms of the available data and are designed to provide as much information as possible about the next unknown event. The process is particularly useful in speech recognition when the next word to be spoken is determined on the basis of the previously spoken words.
-
-
公开(公告)号:DE69230871D1
公开(公告)日:2000-05-11
申请号:DE69230871
申请日:1992-07-10
Applicant: IBM
-
公开(公告)号:AT187564T
公开(公告)日:1999-12-15
申请号:AT94111148
申请日:1994-07-18
Applicant: IBM
Inventor: BERGER ADAM L , BROWN PETER FITZHUGH , DELLA PIETRA STEPHEN ANDREW , KEHLER ANDREW SCOTT , MERCER ROBERT LEROY , DELLA PIETRA VINCENT JOSEPH
IPC: G06F17/28
-
公开(公告)号:DE3878541T2
公开(公告)日:1993-08-12
申请号:DE3878541
申请日:1988-12-12
Applicant: IBM
Inventor: BAHL LALIT RAI , DESOUZA PETER VINCENT , MERCER ROBERT LEROY , PICHENY MICHAEL ALAN
Abstract: In a speech recognition system, a technique is disclosed for segmenting multiple utterances of a vocabulary word in a consistent manner, to determine a Markov model representation for each segment. Plural utterances of a word are converted to label strings. One is selected as prototype, and represented by a sequence of Markov models. All other strings are aligned against the prototype, using stored probabilities, thereby determining substrings and thus segments which correspond to the labels of the prototype sequence. Corresponding segments of all strings are commonly evaluated to determine finally a suitable Markov model representation for each segment. The concatenation represents the baseform for the word.
-
公开(公告)号:DE3686651D1
公开(公告)日:1992-10-08
申请号:DE3686651
申请日:1986-03-27
Applicant: IBM
Inventor: BAHL LALIT RAI , MERCER ROBERT LEROY , DEGENNARO STEVEN VINCENT
Abstract: The method of performing an acoustic match between phones and a string of labels produced by an acoustic processor in response to a spheech input involves forming simplified phone machines. This includes the operation of replacing by a specific value the actual label probabilities for a given label at all transitions at which the given label may be generated in a partic. phone machine. The probability of a phone generating the labels in the string is determined based on the simplified phone machine corresp. to it.
-
公开(公告)号:DE3778579D1
公开(公告)日:1992-06-04
申请号:DE3778579
申请日:1987-02-20
Applicant: IBM
Inventor: BAHL LALIT RAI , BROWN PETER FITZHUGH , DESOUZA PETER VINCENT , MERCER ROBERT LEROY
Abstract: In a word, or speech, recognition system for decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state model and wherein each probabilistic model has transition probability items and output probability items and wherein a value is stored for each of at least some probability items, the present invention relates to apparatus and method for determining probability values for probability items by biassing at least some of the stored values to enhance the likelihood that outputs generated in response to communication of a known word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word. Specifically, the current values of counts - from which probability items are derived - are adjusted by uttering a known word and determining how often probability events occur relative to (a) the model corresponding to the known uttered "correct" word and (b) the model of at least one other "incorrect" word. The current count values are increased based on the event occurrences relating to the correct word and are reduced based on the event occurrences relating to the incorrect word or words.
-
公开(公告)号:DE3681023D1
公开(公告)日:1991-09-26
申请号:DE3681023
申请日:1986-03-27
Applicant: IBM
Inventor: BAHL LALIT RAI , DESOUZA PETER VINCENT , MERCER ROBERT LEROY , PICHNEY MICHAEL ALAN
Abstract: The speech recognition system has a circuit for generating an alphabet of standard labels in response to a first speech input. Each standard label represents a sound type corresponding to a given interval of time. A circuit produces a respective sequence of standard labels from the alphabet in response to the uttering of each word from a vocabulary of words. A circuit selects a set of personalized label representing a sound type corresponding to an interval of time. A circuit forms a respective probalistic model for each standard label. Each model includes a number of states and at least one transition extending from a state to a state. It also includes a transition probability for each transition and, for at least one transition, a number of output probabilities. Each output probability at a given transition in the model of a given standard label represents the likelihood of a respective personalized label being poduced at the given transition.
-
公开(公告)号:DE69315374T2
公开(公告)日:1998-05-28
申请号:DE69315374
申请日:1993-01-15
Applicant: IBM
Inventor: BROWN PETER FITZHUGH , DELLA PIETRA STEPHEN ANDREW , DELLA PIETRA VINCENT JOSEPH , MERCER ROBERT LEROY , JELINEK FREDERICK
Abstract: A speech recognition system displays a source text of one or more words in a source language. The system has an acoustic processor for generating a sequence of coded representations of an utterance to be recognized. The utterance comprises a series of one or more words in a target language different from the source language. A set of one or more speech hypotheses, each comprising one or more words from the target language, are produced. Each speech hypothesis is modeled with an acoustic model. An acoustic match score for each speech hypothesis comprises an estimate of the closeness of a match between the acoustic model of the speech hypothesis and the sequence of coded representations of the utterance. A translation match score for each speech hypothesis comprises an estimate of the probability of occurrence of the speech hypothesis given the occurrence of the source text. A hypothesis score for each hypothesis comprises a combination of the acoustic match score and the translation match score. At least one word of one or more speech hypotheses having the best hypothesis scores is output as a recognition result.
-
-
-
-
-
-
-
-
-