-
公开(公告)号:DE69722980T2
公开(公告)日:2004-05-19
申请号:DE69722980
申请日:1997-01-17
Applicant: IBM
-
公开(公告)号:DE3874049T2
公开(公告)日:1993-04-08
申请号:DE3874049
申请日:1988-06-16
Applicant: IBM
Inventor: BAHL LALIT RAI , MERCER ROBERT LEROY , NAHAMOO DAVID
Abstract: Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an l th label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.
-
-
公开(公告)号:DE3681157D1
公开(公告)日:1991-10-02
申请号:DE3681157
申请日:1986-03-27
Applicant: IBM
Inventor: BAHL LALIT RAI , DESOUZA PETER VINCENT , MERCER ROBERT LEROY
Abstract: In the speech recognition system each word is represented by a sequence of probalistic finite state phone machine. An acoustic processor generates acoustic labels, each from a given label alphabet in response to a spoken input. A first table is formed in which each label of the alphabet provides a vote for each word in the vocabulary. Each label vote for a subject word indicates the liklihood that the subject word when spoken produces the respective label. A second table is also formed which includes a penalty that each label has for each word in the vocabulary. A penalty assigned to a given label indicates the liklihood of a word not producing the given label. Both label votes and penalties are considered in determining the liklihood score for a given word based on a string of labels.
-
公开(公告)号:DE3681156D1
公开(公告)日:1991-10-02
申请号:DE3681156
申请日:1986-03-27
Applicant: IBM
Inventor: BAHL LALIT RAI , DESOUZA PETER VINCENT , MERCER ROBERT LEROY , PICHENY MICHAEL ALAN
Abstract: In the method of constructing a stunted phonetic-type word baseform of phones each fenemic phone in the fenemic phone word baseform is replaced by the corresponding composite phone corresponding. At least one pair of adjacent composite phones are merged together by a single composite phone where the adverse effect of the merging is below a predefined threshold. The two steps are repeated until a word baseform of given length is achieved. In the merging step one pair of adjacent composite phones may be replaced by a single composite phone, so that the replacing results in less adverse effect than the replacing of any other two adjacent composite phones by a single composite phone.
-
公开(公告)号:DE3681155D1
公开(公告)日:1991-10-02
申请号:DE3681155
申请日:1986-03-27
Applicant: IBM
Inventor: BAHL LALIT RAI , JELINEK FREDERICK , MERCER ROBERT LEROY
Abstract: A speech recognition system has an acoustic processor whic-generates a string of acoustic labels in response to speech input and a decoder which matches words in a vocabulary against generated labels in a string. A string of labels are generated in response to a speech input, and selecting words from a vocabulary as possible first words corresp. to labels at the beginning of the string. For a subject selected word, a most likely boundary label interval in the string is located where the subject selected word has the highest probability of ending. A respective likelihood of the subject selected word at each label interval of the string up to and including the most likely boundary label interval is evaluated. The process is repeated for each selected word as the subject selected word. A given selected word is classified as extendible if the likelihood at the partic. label interval corresp. to the most likely boundary label interval is within a predefined range of the highest likelihood for any selected word the partic. label interval.
-
公开(公告)号:DE3852608T2
公开(公告)日:1995-07-06
申请号:DE3852608
申请日:1988-10-19
Applicant: IBM
Inventor: BAHL LALIT RAI , BROWN PETER FITZHUGH , DESOUZA PETER VINCENT , MERCER ROBERT LEROY
Abstract: In order to determine a next event based upon available data, a binary decision tree is constructed having true or false questions at each node and a probability distribution of the unknown next event based upon available data at each leaf. Starting at the root of the tree, the construction process proceeds from node-to-node towards a leaf by answering the question at each node encountered and following either the true or false path depending upon the answer. The questions are phrased in terms of the available data and are designed to provide as much information as possible about the next unknown event. The process is particularly useful in speech recognition when the next word to be spoken is determined on the basis of the previously spoken words.
-
公开(公告)号:DE69010941D1
公开(公告)日:1994-09-01
申请号:DE69010941
申请日:1990-02-27
Applicant: IBM
Inventor: BAHL LALIT RAI , BROWN PETER FITZHUGH , DESOUZA PETER VINCENT , MERCER ROBERT LEROY
Abstract: A continuous speech recognition system includes an automatic phonological rules generator which determines variations in the pronunciation of phonemes based on the context in which they occur. This phonological rules generator associates sequences of labels derived from vocalizations of a training text with respective phonemes inferred from the training text. These sequences are then annotated with their phoneme context from the training text and clustered into groups representing similar pronunciations of each phoneme. A decision tree is generated using the context information of the sequences to predict the clusters to which the sequences belong. The training data is processed by the decision tree to divide the sequences into leaf-groups representing similar pronunciations of each phoneme. The sequences in each leaf-group are clustered into sub-groups representing respectively different pronunciations of their corresponding phoneme in a give context. A Markov model is generated for each sub-group. The various Markov models of a leaf-group are combined into a single compound model by assigning common initial and final states to each model. The compound Markov models are used by a speech recognition system to analyze an unknown sequence of labels given its context.
-
公开(公告)号:DE3878541D1
公开(公告)日:1993-03-25
申请号:DE3878541
申请日:1988-12-12
Applicant: IBM
Inventor: BAHL LALIT RAI , DESOUZA PETER VINCENT , MERCER ROBERT LEROY , PICHENY MICHAEL ALAN
Abstract: In a speech recognition system, a technique is disclosed for segmenting multiple utterances of a vocabulary word in a consistent manner, to determine a Markov model representation for each segment. Plural utterances of a word are converted to label strings. One is selected as prototype, and represented by a sequence of Markov models. All other strings are aligned against the prototype, using stored probabilities, thereby determining substrings and thus segments which correspond to the labels of the prototype sequence. Corresponding segments of all strings are commonly evaluated to determine finally a suitable Markov model representation for each segment. The concatenation represents the baseform for the word.
-
公开(公告)号:DE3685435D1
公开(公告)日:1992-06-25
申请号:DE3685435
申请日:1986-03-27
Applicant: IBM
Inventor: BAHL LALIT RAI , COHEN PAUL SHELDON , MERCER ROBERT LEROY
Abstract: In the speech recognition system each word model phonetic graph is separated into and right hooks influenced by context and internal portions not influenced by context. Each internal portion between two confluent nodes is identified as a confluent link. Each confluent link is represented as a basic subgraph, and each different basic subgraph is stored once for the whole vocabulary. Each hook pair which can form a boundary between two words is combined into a boudary subgraph. Each different boundary subgraph is stored once for the whole vocabulary. During a recognition process for each current word to be matched against speech input, the boundary subgraph corresponding to the right hook of the last recognized word is fetched in combination with the left hook of the current word.
-
-
-
-
-
-
-
-
-