-
1.
公开(公告)号:CA2077728A1
公开(公告)日:1993-06-06
申请号:CA2077728
申请日:1992-09-08
Applicant: IBM
Inventor: BAHL LALIT R , BELLEGARDA JEROME R , DE SOUZA PETER V , GOPALAKRISHNAN PONANI S , NADAS ARTHUR J , NAHAMAOO DAVID , PICHENY MICHAEL A
Abstract: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.
-
2.
公开(公告)号:CA2051602A1
公开(公告)日:1992-04-24
申请号:CA2051602
申请日:1991-09-17
Applicant: IBM
Inventor: BROWN PETER F , DE GENNARO STEVEN V , DE SOUZA PETER V , EPSTEIN MARK E
-
公开(公告)号:CA2089786C
公开(公告)日:1996-12-10
申请号:CA2089786
申请日:1993-02-18
Applicant: IBM
Inventor: BAHL LALIT R , DE SOUZA PETER V , GOPALAKRISHNAN PONANI S , PICHENY MICHAEL A
Abstract: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.
-
公开(公告)号:CA2089786A1
公开(公告)日:1993-10-25
申请号:CA2089786
申请日:1993-02-18
Applicant: IBM
Inventor: BAHL LALIT R , DE SOUZA PETER V , GOPALAKRISHNAN PONANI S , PICHENY MICHAEL A
Abstract: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.
-
公开(公告)号:CA2068041A1
公开(公告)日:1993-01-17
申请号:CA2068041
申请日:1992-05-05
Applicant: IBM
Inventor: BAHL LALIT R , BELLEGARDA JEROME R , DE SOUZA PETER V , NAHAMOO DAVID , PICHENY MICHAEL A
Abstract: An apparatus for generating a set of acoustic prototype signals for encoding speech includes means for storing a training script model comprises a series of word-segment models. Each word-segment model comprises a series of elementary models. Means are provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. Means are provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises means for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes means for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.
-
公开(公告)号:CA2060591A1
公开(公告)日:1992-09-23
申请号:CA2060591
申请日:1992-02-04
Applicant: IBM
Inventor: BAHL LALIT R , PICHENY MICHAEL A , NAHAMOO DAVID , DE SOUZA PETER V
Abstract: The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes. The class of sound whose group of prototypes, or at least one of the prototypes, whose combined value most closely matches the value of the feature value of the feature vector is deemed to be the class corresponding to the feature vector. The feature vector is then labeled with the identifier associated with that class.
-
公开(公告)号:CA2068041C
公开(公告)日:1996-10-29
申请号:CA2068041
申请日:1992-05-05
Applicant: IBM
Inventor: BAHL LALIT R , BELLEGARDA JEROME R , DE SOUZA PETER V , NAHAMOO DAVID , PICHENY MICHAEL A
Abstract: An apparatus for generating a set of acoustic prototype signals for encoding speech includes means for storing a training script model comprises a series of word-segment models. Each word-segment model comprises a series of elementary models. Means are provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. Means are provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises means for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes means for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.
-
公开(公告)号:CA2060591C
公开(公告)日:1996-08-13
申请号:CA2060591
申请日:1992-02-04
Applicant: IBM
Inventor: BAHL LALIT R , PICHENY MICHAEL A , NAHAMOO DAVID , DE SOUZA PETER V
Abstract: The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes. The class of sound whose group of prototypes, or at least one of the prototypes, whose combined value most closely matches the value of the feature value of the feature vector is deemed to be the class corresponding to the feature vector. The feature vector is then labeled with the identifier associated with that class.
-
9.
公开(公告)号:CA2077728C
公开(公告)日:1996-08-06
申请号:CA2077728
申请日:1992-09-08
Applicant: IBM
Inventor: BAHL LALIT R , BELLEGARDA JEROME R , DE SOUZA PETER V , GOPALAKRISHNAN PONANI S , NADAS ARTHUR J , NAHAMOO DAVID , PICHENY MICHAEL A
Abstract: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.
-
公开(公告)号:CA2073991A1
公开(公告)日:1993-04-24
申请号:CA2073991
申请日:1992-07-16
Applicant: IBM
Inventor: BAHL LALIT R , DE SOUZA PETER V , GOPALAKRISHNAM PONANI S , PICHENY MICHAEL A
-
-
-
-
-
-
-
-
-