22.
    发明专利
    未知

    公开(公告)号:DE69221403T2

    公开(公告)日:1998-02-19

    申请号:DE69221403

    申请日:1992-05-20

    Applicant: IBM

    Abstract: An apparatus for generating a set of acoustic prototype signals for encoding speech includes means for storing a training script model comprises a series of word-segment models. Each word-segment model comprises a series of elementary models. Means are provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. Means are provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises means for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes means for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.

    CONTEXT-DEPENDENT SPEECH RECOGNIZER USING ESTIMATED NEXT WORD CONTEXT

    公开(公告)号:CA2089786C

    公开(公告)日:1996-12-10

    申请号:CA2089786

    申请日:1993-02-18

    Applicant: IBM

    Abstract: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.

    CONTEXT-DEPENDENT SPEECH RECOGNIZER USING ESTIMATED NEXT WORD CONTEXT

    公开(公告)号:CA2089786A1

    公开(公告)日:1993-10-25

    申请号:CA2089786

    申请日:1993-02-18

    Applicant: IBM

    Abstract: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.

    TRAINING OF MARKOV MODELS USED IN A SPEECH RECOGNITION SYSTEM

    公开(公告)号:CA1262188A

    公开(公告)日:1989-10-03

    申请号:CA528790

    申请日:1987-02-02

    Applicant: IBM

    Abstract: IMPROVING THE TRAINING OF MARKOV MODELS USED IN A SPEECH RECOGNITION SYSTEM In a word, or speech, recognition system for decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state model and wherein each probabilistic model has transition probability items and output probability items and wherein a value is stored for each of at least some probability items, the present invention relates to apparatus and method for determining probability values for probability items by biassing at least some of the stored values to enhance the likelihood that outputs generated in response to communication of a known word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word. Specifically, the current values of counts --from which probability items are derived-- are adjusted by uttering a known word and determining how often probability events occur relative to (a) the model corresponding to the known uttered "correct" word and (b) the model of at least one other "incorrect" word. The current count values are increased based on the event occurrences relating co the correct word and are reduced based on the event occurrences relating to the incorrect word or words.

    SPEECH RECOGNITION EMPLOYING A SET OF MARKOV MODELS THAT INCLUDES MARKOV MODELS REPRESENTING TRANSITIONS TO AND FROM SILENCE

    公开(公告)号:CA1259411A

    公开(公告)日:1989-09-12

    申请号:CA504807

    申请日:1986-03-24

    Applicant: IBM

    Abstract: SPEECH RECOGNITION EMPLOYING A SET OF MARKOV MODELS THAT INCLUDES MARKOV MODELS REPRESENTING TRANSITIONS TO AND FROM SILENCE The present invention relates to apparatus and method for constructing word baseforms which can be matched against a string of generated acoustic labels which includes: forming a set of phonetic phone machines, wherein each phone machine has (i) a plurality of states, (ii) a plurality of transitions each of which extends from a state to a state, (iii) a stored probability for each transition, and (iv) stored label output probabilities, each label output probability corresponding to the probability of said each phone machine producing a corresponding label; wherein said set of phonetic machines is formed to include a subset of onset phone machines, the stored probabilities of each onset phone machine corresponding to at least one phonetic element being uttered at the beginning of a speech segment; and wherein said set of phonetic machines is formed to include a subset of trailing phone machines, the stored probabilities of each trailing phone machine corresponding to at least one single phonetic element being uttered at the end of a speech segment. Word baseforms are constructed by concatenating phone machines selected from the set.

    APPARATUS AND METHOD FOR PERFORMING ACOUSTIC MATCHING

    公开(公告)号:CA1236577A

    公开(公告)日:1988-05-10

    申请号:CA494697

    申请日:1985-11-06

    Applicant: IBM

    Abstract: The invention herein provides, in a speech recognition system which represents each vocabulary word or a portion thereof by at least one sequence of phones wherein each phone corresponds to a respective phone machine, each phone machine having associated therewith (a) at least one transition and (b) actual label output probabilities, each actual label probability representing the probability that a subject label is generated at a given transition in the phone machine, a method of performing an acoustic match between phones and a string of labels produced by an acoustic processor in response to a speech input, the method comprising the steps of: forming simplified phone machines which includes the step of replacing by a single specific value the actual label probabilities for a given label at all transitions at which the given label may be generated in a particular phone machine; and determining the probability of a phone generating the labels in the string based on the simplified phone machine corresponding thereto.

    FINITE MEMORY ADAPTIVE PREDICTOR
    28.
    发明专利

    公开(公告)号:CA982265A

    公开(公告)日:1976-01-20

    申请号:CA173798

    申请日:1973-06-12

    Applicant: IBM

    Abstract: A system for compacting digital data by means of prediction error coding. Prediction for each unknown bit is a function of previous detected levels in the data stream. A plurality of n-bit up-down counters, each associated with one of the possible states of prediction for an unknown bit, is utilized to arrive at a prediction of the level of the unknown bit. If the value found in the up-down counter is above a pre-specified level, a prediction will be made that the unknown bit is a one, otherwise, the prediction is zero.

Patent Agency Ranking