FAST ALGORITHM FOR DERIVING ACOUSTIC PROTOTYPES FOR AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:CA2068041A1

    公开(公告)日:1993-01-17

    申请号:CA2068041

    申请日:1992-05-05

    Applicant: IBM

    Abstract: An apparatus for generating a set of acoustic prototype signals for encoding speech includes means for storing a training script model comprises a series of word-segment models. Each word-segment model comprises a series of elementary models. Means are provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. Means are provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises means for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes means for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.

    SPEAKER-INDEPENDENT LABEL CODING APPARATUS

    公开(公告)号:CA2060591A1

    公开(公告)日:1992-09-23

    申请号:CA2060591

    申请日:1992-02-04

    Applicant: IBM

    Abstract: The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes. The class of sound whose group of prototypes, or at least one of the prototypes, whose combined value most closely matches the value of the feature value of the feature vector is deemed to be the class corresponding to the feature vector. The feature vector is then labeled with the identifier associated with that class.

    APPARATUS AND METHOD FOR DETERMINING A LIKELY WORD SEQUENCE FROM LABELS GENERATED BY AN ACOUSTIC PROCESSOR

    公开(公告)号:CA1248633A

    公开(公告)日:1989-01-10

    申请号:CA504800

    申请日:1986-03-24

    Applicant: IBM

    Abstract: APPARATUS AND METHOD FOR DETERMINING A LIKELY WORD SEQUENCE FROM LABELS GENERATED BY AN AN ACOUSTIC PROCESSOR The present invention addresses the problem of determining, in a speech recognition context, a likely sequence or path of words from a plurality of word paths given a string of labels that are generated at successive intervals. The invention features multiple stack decoding and a unique strategy for extending one word path at a time without undue reliance on word path length. With multiple stack decoding, a stack is associated with each label of the label string. Word paths that most likely end at a given label are assigned to the stack corresponding to the given label and are ordered according to likelihood at the given label. The strategy of deciding which word path to extend includes the forming of a likelihood envelope against which the word paths are compared to determine if a word path is sufficiently likely to be extended. From among the word paths that are found to be extendible, the word path of highest likelihood in the earliest stack --i.e. the shortest most likely word path-- is selected for extension. After a word path is extended, it is deleted from its stack and the word paths extended therefrom are entered into appropriate stacks.

    APPARATUS AND METHOD FOR PRODUCING A LIST OF LIKELY CANDIDATE WORDS CORRESPONDING TO A SPOKEN INPUT

    公开(公告)号:CA1246229A

    公开(公告)日:1988-12-06

    申请号:CA504806

    申请日:1986-03-24

    Applicant: IBM

    Abstract: APPARATUS AND METHOD FOR PRODUCING A LIST OF LIKELY CANDIDATE WORDS CORRESPONDING TO A SPOKEN INPUT A speech recognition apparatus and method of selecting likely word from a vocabulary of words, wherein each word is represented by a sequence of at least one probabilistic finite state phone machine and wherein an acoustic processor generates acoustic labels in response to a spoken input, include: (a) forming a first table in which each label in the alphabet provides a vote for each word in the vocabulary, each label vote for a subject word indicating the likelihood of the subject word producing the label providing the vote; (b) forming a second table in which each label is assigned a penalty for each word in the vocabulary, the penalty assigned to a given label for a given word being indicative of the likelihood of the given label not being produced according to the model for the given word; and (c) for a given string of labels, determining the likelihood of a particular word which includes the step of combining the votes of all labels in the string for the particular word together with the penalties of all labels not in the string for the particular word.

    FENEME-BASED MARKOV MODELS FOR WORDS

    公开(公告)号:CA1236578A

    公开(公告)日:1988-05-10

    申请号:CA496161

    申请日:1985-11-26

    Applicant: IBM

    Abstract: FENEME-BASED MARKOV MODELS FOR WORDS In a speech recognition system, apparatus and method for modelling words with label-based Markov models is disclosed. The modelling includes: entering a first speech input, corresponding to words in a vocabulary, into an acoustic processor which converts each spoken word into a sequence of standard labels, where each standard label corresponds to a sound type assignable to an interval of time; representing each standard label as a probabilistic model which has a plurality of states, at least one transition from a state to a state, and at least one settable output probability at some transitions; entering selected acoustic inputs into an acoustic processor which converts the selected acoustic inputs into personalized labels, each personalized label corresponding to a sound type assigned to an interval of time; and setting each output probability as the probability or the standard label represented by a given model producing a particular personalized label at a given transition in the given model. The present invention addresses the problem of generating models of words simply and automatically in a speech recognition system.

    36.
    发明专利
    未知

    公开(公告)号:FR2326081A1

    公开(公告)日:1977-04-22

    申请号:FR7626313

    申请日:1976-08-25

    Applicant: IBM

    Abstract: An apparatus is disclosed for compressing a p x q image array of two-valued (black/white) sample points. The image array points are serially applied to the apparatus in consecutive raster scan lines. In response, the apparatus simultaneously forms two matrices respectively representing a high order p x q predictive error array and a p x q array of location events (such as the raster leading edges of all objects in the image). Improved compression is achieved by selecting between the more compression efficient of two methods for encoding the position of errors in the prediction error array. These alternative methods are conventional run-length coding and a novel form of reference encoding, used selectively but to significant advantage. Thus, a run-length compression codeword is formed from the count C of non-errors between consecutive errors (in response to the occurrence of each error in the jth bit position of the ith scan line of the predictive error array) upon either C T and there being no occurrence of a line difference encoding for the error (where i, j, C and T have positive integers). A line difference codeword with difference value v is generated upon the joint event of C>T and either the single or multiple occurrence of location events in the ith-1 scan line of the location event array within the bit position range of B

    37.
    发明专利
    未知

    公开(公告)号:DE2340250A1

    公开(公告)日:1974-02-28

    申请号:DE2340250

    申请日:1973-08-09

    Applicant: IBM

    Abstract: A data compaction system wherein segmented binary data that has redundancy between segments is compacted by means of differential run-length encoding. For compaction of document digitized data, the segments represent lines on the document. Black image points on the document which are represented by a "1" are coded relative to the position of a 1 appearing in the line above the one being coded. The differential distance between binary 1 bit positions on successive lines are coded in accordance with a compaction code. Codewords having a small number of bits are used for small differentials.

    38.
    发明专利
    未知

    公开(公告)号:DE2340230A1

    公开(公告)日:1974-02-28

    申请号:DE2340230

    申请日:1973-08-09

    Applicant: IBM

    Abstract: A system for compacting digital data by means of prediction error coding. Prediction for each unknown bit is a function of previous detected levels in the data stream. A plurality of n-bit up-down counters, each associated with one of the possible states of prediction for an unknown bit, is utilized to arrive at a prediction of the level of the unknown bit. If the value found in the up-down counter is above a pre-specified level, a prediction will be made that the unknown bit is a one, otherwise, the prediction is zero.

Patent Agency Ranking