-
公开(公告)号:DE69224253D1
公开(公告)日:1998-03-05
申请号:DE69224253
申请日:1992-08-31
Applicant: IBM
Inventor: BAHL LALIT R , BELLEGARDA JEROME R , EPSTEIN EDWARD ADAM , LUCASSEN JOHN M , NAHAMOO DAVID , PICHENY MICHAEL ALAN
-
公开(公告)号:DE69221403T2
公开(公告)日:1998-02-19
申请号:DE69221403
申请日:1992-05-20
Applicant: IBM
Inventor: BAHL LALIT R , BELLEGARDA JEROME R , DE SOUZA PETER VINCENT , NAHAMOO DAVID , PICHENY MICHAEL ALAN
Abstract: An apparatus for generating a set of acoustic prototype signals for encoding speech includes means for storing a training script model comprises a series of word-segment models. Each word-segment model comprises a series of elementary models. Means are provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. Means are provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises means for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes means for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.
-
公开(公告)号:CA2089786C
公开(公告)日:1996-12-10
申请号:CA2089786
申请日:1993-02-18
Applicant: IBM
Inventor: BAHL LALIT R , DE SOUZA PETER V , GOPALAKRISHNAN PONANI S , PICHENY MICHAEL A
Abstract: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.
-
公开(公告)号:CA2089786A1
公开(公告)日:1993-10-25
申请号:CA2089786
申请日:1993-02-18
Applicant: IBM
Inventor: BAHL LALIT R , DE SOUZA PETER V , GOPALAKRISHNAN PONANI S , PICHENY MICHAEL A
Abstract: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.
-
公开(公告)号:CA1262188A
公开(公告)日:1989-10-03
申请号:CA528790
申请日:1987-02-02
Applicant: IBM
Inventor: BAHL LALIT R , BROWN PETER F , DESOUZA PETER V , MERCER ROBERT L
Abstract: IMPROVING THE TRAINING OF MARKOV MODELS USED IN A SPEECH RECOGNITION SYSTEM In a word, or speech, recognition system for decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state model and wherein each probabilistic model has transition probability items and output probability items and wherein a value is stored for each of at least some probability items, the present invention relates to apparatus and method for determining probability values for probability items by biassing at least some of the stored values to enhance the likelihood that outputs generated in response to communication of a known word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word. Specifically, the current values of counts --from which probability items are derived-- are adjusted by uttering a known word and determining how often probability events occur relative to (a) the model corresponding to the known uttered "correct" word and (b) the model of at least one other "incorrect" word. The current count values are increased based on the event occurrences relating co the correct word and are reduced based on the event occurrences relating to the incorrect word or words.
-
公开(公告)号:CA1259411A
公开(公告)日:1989-09-12
申请号:CA504807
申请日:1986-03-24
Applicant: IBM
Inventor: BAHL LALIT R , DESOUZA PETER V , MERCER ROBERT L , PICHENY MICHAEL A
IPC: G10L9/02
Abstract: SPEECH RECOGNITION EMPLOYING A SET OF MARKOV MODELS THAT INCLUDES MARKOV MODELS REPRESENTING TRANSITIONS TO AND FROM SILENCE The present invention relates to apparatus and method for constructing word baseforms which can be matched against a string of generated acoustic labels which includes: forming a set of phonetic phone machines, wherein each phone machine has (i) a plurality of states, (ii) a plurality of transitions each of which extends from a state to a state, (iii) a stored probability for each transition, and (iv) stored label output probabilities, each label output probability corresponding to the probability of said each phone machine producing a corresponding label; wherein said set of phonetic machines is formed to include a subset of onset phone machines, the stored probabilities of each onset phone machine corresponding to at least one phonetic element being uttered at the beginning of a speech segment; and wherein said set of phonetic machines is formed to include a subset of trailing phone machines, the stored probabilities of each trailing phone machine corresponding to at least one single phonetic element being uttered at the end of a speech segment. Word baseforms are constructed by concatenating phone machines selected from the set.
-
公开(公告)号:CA1236577A
公开(公告)日:1988-05-10
申请号:CA494697
申请日:1985-11-06
Applicant: IBM
Inventor: BAHL LALIT R , MERCER ROBERT L , DEGENNARO STEVEN V
IPC: G10L5/00
Abstract: The invention herein provides, in a speech recognition system which represents each vocabulary word or a portion thereof by at least one sequence of phones wherein each phone corresponds to a respective phone machine, each phone machine having associated therewith (a) at least one transition and (b) actual label output probabilities, each actual label probability representing the probability that a subject label is generated at a given transition in the phone machine, a method of performing an acoustic match between phones and a string of labels produced by an acoustic processor in response to a speech input, the method comprising the steps of: forming simplified phone machines which includes the step of replacing by a single specific value the actual label probabilities for a given label at all transitions at which the given label may be generated in a particular phone machine; and determining the probability of a phone generating the labels in the string based on the simplified phone machine corresponding thereto.
-
公开(公告)号:CA982265A
公开(公告)日:1976-01-20
申请号:CA173798
申请日:1973-06-12
Applicant: IBM
Inventor: BAHL LALIT R , GROSSMAN DAVID D , BARNEA DANIEL I , KOBAYASHI HISASHI
Abstract: A system for compacting digital data by means of prediction error coding. Prediction for each unknown bit is a function of previous detected levels in the data stream. A plurality of n-bit up-down counters, each associated with one of the possible states of prediction for an unknown bit, is utilized to arrive at a prediction of the level of the unknown bit. If the value found in the up-down counter is above a pre-specified level, a prediction will be made that the unknown bit is a one, otherwise, the prediction is zero.
-
公开(公告)号:DE69224253T2
公开(公告)日:1998-08-13
申请号:DE69224253
申请日:1992-08-31
Applicant: IBM
Inventor: BAHL LALIT R , BELLEGARDA JEROME R , EPSTEIN EDWARD ADAM , LUCASSEN JOHN M , NAHAMOO DAVID , PICHENY MICHAEL ALAN
-
30.
公开(公告)号:CA2072721A1
公开(公告)日:1993-04-04
申请号:CA2072721
申请日:1992-06-29
Applicant: IBM
Inventor: BAHL LALIT R , BELLEGARDA JEROME R , EPSTEIN EDWARD A , LUCASSEN JOHN M , NAHAMOO DAVID , PICHENY MICHAEL A
-
-
-
-
-
-
-
-
-