31.
    发明专利
    未知

    公开(公告)号:DE69720087D1

    公开(公告)日:2003-04-30

    申请号:DE69720087

    申请日:1997-01-17

    Applicant: IBM

    Abstract: A method and apparatus for removing the effect of background music or noise from speech input to a speech recognizer so as to improve recognition accuracy has been devised. Samples of pure music or noise related to the background music or noise that corrupts the speech input are utilized to reduce the effect of the background in speech recognition. The pure music and noise samples can be obtained in a variety of ways. The music or noise corrupted speech input is segmented in overlapping segments and is then processed in two phases: first, the best matching pure music or noise segment is aligned with each speech segment; then a linear filter is built for each segment to remove the effect of background music or noise from the speech input and the overlapping segments are averaged to improve the signal to noise ratio. The resulting acoustic output can then be fed to a speech recognizer.

    32.
    发明专利
    未知

    公开(公告)号:DE69518723D1

    公开(公告)日:2000-10-12

    申请号:DE69518723

    申请日:1995-06-21

    Applicant: IBM

    Abstract: A method for estimating the probability of phone boundaries as well as the accuracy of the acoustic modelling in cutting down a search-space in a speech recognition system. The accuracy of the acoustic modelling is quantified by the rank of the correct phone. The invention includes a microphone for converting an utterance into an electrical signal. The signal from the microphone is processed by an acoustic processor and label match which finds the best-matched acoustic label prototype from the acoustic label prototype store. A probability distribution on phone boundaries is then produced for every time frame using the first decision tree described in the invention. These probabilities are compared to a threshold and some time frames are identified as boundaries between phones. An acoustic score is computed, for all phones between every given pair of hypothesized boundaries, and the phones are ranked on the basis of this score. The second decision tree is traversed for every time frame to obtain the worst case rank of the correct phone at that time, and using the phone score and phone rank computed in, a shortlist of allowed phones is made up for every time frame. This information is used to select a subset of acoustic word models in store, and a fast acoustic word match processor matches the label string from the acoustic processor against this subset of abridged acoustic word models to produce an utterance signal. The utterance signal output by the fast acoustic word match processor comprises of at least one word. In general, however, the fast acoustic word match processor will output a number of candidate words. Each word signal produced by the fast acoustic word match processor is input into a word context match which compares the word context to language models in store and outputs at least one candidate word. From the recognition candidates produced by the fast acoustic match and the language model, the detailed acoustic match matches the label string from the acoustic processor against detailed acoustic word models in store and outputs a word string corresponding to an utterance.

    33.
    发明专利
    未知

    公开(公告)号:DE69425412D1

    公开(公告)日:2000-09-07

    申请号:DE69425412

    申请日:1994-10-21

    Applicant: IBM

    Abstract: An automatic handwriting recognition system wherein each written (chirographic) manifestation of each character is represented by a statistical model (called a hidden Markov model). The system implements a method which entails sampling a pool of independent writers and deriving a hidden Markov model for each particular character (allograph) which is independent of a particular writer. The HMMs are used to derive a chirographic label alphabet which is independent of each writer. This is accomplished during what is described as the training phase of the system. The alphabet is constructed using supervised techniques. That is, the alphabet is constructed using information learned in the training phase to adjust the result according to a statistical algorithm (such as a Viterbi alignment) to arrive at a cost efficient recognition tool. Once such an alphabet is constructed a new set of HMMs can be defined which more accurately reflects parameter typing across writers. The system recognizes handwriting by applying an efficient hierarchical decoding strategy which employs a fast match and a detailed match function, thereby making the recognition cost effective.

    34.
    发明专利
    未知

    公开(公告)号:DE69221403D1

    公开(公告)日:1997-09-11

    申请号:DE69221403

    申请日:1992-05-20

    Applicant: IBM

    Abstract: An apparatus for generating a set of acoustic prototype signals for encoding speech includes means for storing a training script model comprises a series of word-segment models. Each word-segment model comprises a series of elementary models. Means are provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. Means are provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises means for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes means for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.

    35.
    发明专利
    未知

    公开(公告)号:DE69028842D1

    公开(公告)日:1996-11-14

    申请号:DE69028842

    申请日:1990-12-13

    Applicant: IBM

    Abstract: A method and apparatus of modeling a word by concatenating a series of elemental models to form a word model. At least one elemental model in the series is a composite elemental model formed by combining the starting states of at least first and second primitive elemental models. Each primitive elemental model represents a speech component. The primitive elemental models are combined by a weighted combination of their parameters in proportion to the values of the weighting factors. In order to tailor the word model to closely represent variations in the pronunciation of the word, the word is uttered a plurality of times by a plurality of different speakers. From the prior values of the weighting factors, and from the values of the parameters of the first and second primitive elemental models, the conditional probability of occurrence of the first primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of component sounds is estimated. A posterior value for the first weighting factor is estimated from the conditional probability. By constructing word models from composite elemental models, and by constructing composite elemental models from primitive elemental models, it is possible for the resulting word model to closely represent many variations in the pronunciation of a word. By providing a relatively small set of primitive elemental models in comparison to a relatively large vocabulary of words, the models can be trained to the voice of a new speaker by having the new speaker utter only a small subset of the words in the vocabulary.

    FAST ALGORITHM FOR DERIVING ACOUSTIC PROTOTYPES FOR AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:CA2068041C

    公开(公告)日:1996-10-29

    申请号:CA2068041

    申请日:1992-05-05

    Applicant: IBM

    Abstract: An apparatus for generating a set of acoustic prototype signals for encoding speech includes means for storing a training script model comprises a series of word-segment models. Each word-segment model comprises a series of elementary models. Means are provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. Means are provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises means for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes means for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.

    SPEAKER-INDEPENDENT LABEL CODING APPARATUS

    公开(公告)号:CA2060591C

    公开(公告)日:1996-08-13

    申请号:CA2060591

    申请日:1992-02-04

    Applicant: IBM

    Abstract: The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes. The class of sound whose group of prototypes, or at least one of the prototypes, whose combined value most closely matches the value of the feature value of the feature vector is deemed to be the class corresponding to the feature vector. The feature vector is then labeled with the identifier associated with that class.

    SPEECH CODING APPARATUS HAVING SPEAKER DEPENDENT PROTOTYPES GENERATED FROM A NONUSER REFERENCE DATA

    公开(公告)号:CA2077728C

    公开(公告)日:1996-08-06

    申请号:CA2077728

    申请日:1992-09-08

    Applicant: IBM

    Abstract: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.

    39.
    发明专利
    未知

    公开(公告)号:DE3874049T2

    公开(公告)日:1993-04-08

    申请号:DE3874049

    申请日:1988-06-16

    Applicant: IBM

    Abstract: Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an l th label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.

    40.
    发明专利
    未知

    公开(公告)号:DE3878071D1

    公开(公告)日:1993-03-18

    申请号:DE3878071

    申请日:1988-05-31

    Applicant: IBM

    Abstract: In a speech processor system in which prototype vectors of speech are generated by an acoustic processor under reference noise and known ambient conditions and in which feature vectors of speech are generated during varying noise and other ambient and recording conditions, normalized vectors are generated to reflect the form the feature vectors would have if generated under the reference conditions. The normalized vectors are generated by: (a) applying an operator function Aito a set of feature vectors x occurring at or before time interval i to yield a normalized vector yi = Ai(x); (b) determining a distance error vector Ei by which the normalized vector is projectively moved toward the closest prototype vector to the normalized vector yi; (c) up-dating the operator function for next time interval to correspond to the most recently determined distance error vector; and (d) incrementing i to the next time interval and repeating steps (a) through (d) wherein the feature vector corresponding to the incremented i value has the most recent up-dated operator function applied thereto. With successive time intervals, successive normalized vectors are generated based on a successively up-dated operator function. For each normalized vector, the closest prototype thereto is associated therewith. The string of normalized vectors or the string of associated prototypes (or respective label identifiers thereof) or both provide output from the acoustic processor.

Patent Agency Ranking