1.
    发明专利
    未知

    公开(公告)号:DE69129015D1

    公开(公告)日:1998-04-09

    申请号:DE69129015

    申请日:1991-12-10

    Applicant: IBM

    Abstract: The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes. The class of sound whose group of prototypes, or at least one of the prototypes, whose combined value most closely matches the value of the feature value of the feature vector is deemed to be the class corresponding to the feature vector. The feature vector is then labeled with the identifier associated with that class.

    SPEECH RECOGNITION SYSTEM
    3.
    发明专利

    公开(公告)号:CA1257697A

    公开(公告)日:1989-07-18

    申请号:CA528791

    申请日:1987-02-02

    Applicant: IBM

    Abstract: SPEECH RECOGNITION SYSTEM Apparatus and method for evaluating the likelihood of a word in a vocabulary of words wherein a total score is evaluated for each word, each total score being the result of combining at least two word scores generated by differing algorithms. In one embodiment, a detailed acoustic match word score is combined with an approximate acoustic match word score to provide a total word score for a subject word. In another embodiment, a polling word score is combined with an acoustic match word score to provide a total word score for a subject word. The acoustic models employed in the acoustic matching may correspond, alternatively, to phonetic elements or to fenemes. Fenemes represent labels generated by an acoustic processor in response to a spoken input. Apparatus and method for determining word scores according to approximate acoustic matching and for determining word scores according to a polling methodology are disclosed.

    AUTOMATIC GENERATION OF SIMPLE MARKOV MODEL STUNTED BASEFORMS FOR WORDS IN A VOCABULARY

    公开(公告)号:CA1238978A

    公开(公告)日:1988-07-05

    申请号:CA504802

    申请日:1986-03-24

    Applicant: IBM

    Abstract: AUTOMATIC GENERATION OF SIMPLE MARKOV MODEL STUNTED BASEFORMS FOR WORDS IN A VOCABULARY The present invention addresses the problem of automatically constructing a phonetic-type baseform which, for a given word, is stunted in length relative to a fenemic baseform for the given word. Specifically, in a system that (i) defines each word in a vocabulary by a fenemic baseform of fenemic phones, (ii) defines an alphabet of composite phones each of which corresponds to at least one fenemic phone, and (iii) generates a string of fenemes in response to speech input, the present invention provides for converting a word baseform comprised of fenemic phones into a stunted word baseform of composite phones by (a) replacing each fenemic phone in the fenemic phone word baseform by the composite phone corresponding thereto; and (b) merging together at least one pair of adjacent composite phones by a single composite phone where the adverse effect of the merging is below a predefined threshold.

    IMAGE COMPACTION SYSTEM
    5.
    发明专利

    公开(公告)号:CA996275A

    公开(公告)日:1976-08-31

    申请号:CA173792

    申请日:1973-06-12

    Applicant: IBM

    Abstract: A data compaction system wherein segmented binary data that has redundancy between segments is compacted by means of differential run-length encoding. For compaction of document digitized data, the segments represent lines on the document. Black image points on the document which are represented by a "1" are coded relative to the position of a 1 appearing in the line above the one being coded. The differential distance between binary 1 bit positions on successive lines are coded in accordance with a compaction code. Codewords having a small number of bits are used for small differentials.

    SPEECH RECOGNITION APPARATUS HAVING A SPEECH CODER OUTPUTTING ACOUSTIC PROTOTYPE RANKS

    公开(公告)号:CA2073991C

    公开(公告)日:1996-08-06

    申请号:CA2073991

    申请日:1992-07-16

    Applicant: IBM

    Abstract: A speech coding and speech recognition apparatus. The value of at least one feature of an utterance is measured over each of a series of successive time intervals to produce a series of feature vector signals. The closeness of the feature value of each feature vector signal to the parameter value of each of a set of prototype vector signals is determined to obtain prototype match scores for each vector signal and each prototype vector signal. For each feature vector signal, first-rank and second-rank scores are associated with the prototype vector signals having the best and second best prototype match scores, respectively. For each feature vector signal, at least the identification value and the rank score of the first-ranked and second-ranked prototype vector signals are output as a coded utterance representation signal of the feature vector signal, to produce a series of coded utterance representation signals. For each of a plurality of speech units, a probabilistic model has a plurality of model outputs, and output probabilities for each model output. Each model output comprises the identification value of a prototype vector and a rank score. For each speech unit, a match score comprises an estimate of the probability that the probabilistic model of the speech unit would output a series of model outputs matching a reference series comprising the identification value and rank score of at least one prototype vector from each coded utterance representation signal in the series of coded utterance representation signals.

    RAPIDLY TRAINING A SPEECH RECOGNIZER TO A SUBSEQUENT SPEAKER GIVEN TRAINING DATA OF A REFERENCE SPEAKER

    公开(公告)号:CA1332195C

    公开(公告)日:1994-09-27

    申请号:CA570927

    申请日:1988-06-30

    Applicant: IBM

    Abstract: RAPIDLY TRAINING A SPEECH RECOGNIZER TO A SUBSEQUENT SPEAKER GIVEN TRAINING DATA OF A REFERENCE SPEAKER Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an ?th label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.

    SPEECH CODING APPARATUS HAVING SPEAKER DEPENDENT PROTOTYPES GENERATED FROM A NONUSER REFERENCE DATA

    公开(公告)号:CA2077728A1

    公开(公告)日:1993-06-06

    申请号:CA2077728

    申请日:1992-09-08

    Applicant: IBM

    Abstract: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.

    SPEECH RECOGNITION SYSTEM WITH EFFICIENT STORAGE AND RAPID ASSEMBLY OF PHONOLOGICAL GRAPHS

    公开(公告)号:CA1242028A

    公开(公告)日:1988-09-13

    申请号:CA504805

    申请日:1986-03-24

    Applicant: IBM

    Abstract: SPEECH RECOGNITION SYSTEM WITH EFFICIENT STORAGE AND RAPID ASSEMBLY OF PHONOLOGICAL GRAPHS A continuous speech recognition system is disclosed having a speech processor and a word recognition computer subsystem, charcterized by means associated with the speech processor for developing a graph of confluent links between confluent nodes; means associated with the speech processor for developing a graph of boundary links between adjacent words; means associated with the speech processor for storing an inventory of confluent links and boundary links as a coding inventory; means associated with the speech processor for converting an unknown utterance into an encoded sequence of confluent links and boundary links corresponding to recognition sequences stored in said word recognition subsystem recognition vocabulary for speech recognition. The invention also includes method for achieving continuous speech recognition by characterizing speech as a sequence of confluent links which are matched with candidate words. The invention also applies to isolated word speech recognition as with continuous speech recognition, except that in such case there are no boundary links.

    ERROR CORRECTION ON BURST CHANNELS BY SEQUENTIAL CODING

    公开(公告)号:CA1129030A

    公开(公告)日:1982-08-03

    申请号:CA354508

    申请日:1980-06-20

    Applicant: IBM

    Abstract: ERROR CORRECTION ON BURST CHANNELS BY SEQUENTIAL DECODING A sequential decoder for error correction on burst and random noise channels using convolutionally encoded data. The decoder interacts with a deinterleaver which time demultiplexes data from a data channel from its time multiplexed form into a predetermined transformed order. The decoder includes a memory for storing a table of likelihood values which are derived from known error statistics about the data channel such as the probabilities of random errors and burst errors, burst error severity and burst duration. The decoder removes an encoded subblock of data from the deinterleaver and enters it into a replica of the convolutional encoder which calculates a syndrome bit from a combination of the presently received subblock together with a given number of previous subblocks. The syndrome bit indicates if the current assumption of the path through the convolutional tree is correct. Where there is no error in the channel, then the received sequence is a code word and the syndrome bit indicates that the correct path in the convolution tree is taken. For each received bit, an indicator bit is calculated which is a function of the difference between the current path and the received sequence. The sequential decoder employs the syndrome bit together with burst indicator bits to calculate a table address in a table of likelihood values and error pattern values. The likelihood value is used to update a total likelihood of error value and the error pattern value is used to change the received subblock of data. YO975-038

Patent Agency Ranking