METHOD AND SYSTEM FOR NATURAL LANGUAGE TRANSLATION

    公开(公告)号:CA2068780C

    公开(公告)日:1998-12-22

    申请号:CA2068780

    申请日:1992-05-15

    Applicant: IBM

    Abstract: The present invention is a system for translating text from a first source language into second target language. The system assigns probabilities or scores to various target-language translations and then displays or makes otherwise available the highest, scoring translations. The source text is first transduced into one or more intermediate structural representations. From these intermediate source structures a set of intermediate target-structure hypotheses is generated. These hypotheses are scored by two different models: a language model which assigns a probability or score to an intermediate target structure, and a translation model which assigns a probability or score to the event that an intermediate target structure is translated into an intermediate source structure. Scores from the translation model and language model are combined into a combined score for each intermediate target-structure hypothesis. Finally, a set of target-text hypotheses is produced by transducing the highest scoring target-structure hypotheses into portions of text into the target language. The system can either run into batch mode, in which case it translates source-language text into a target language without human assistance, or it can function as an aid to a human translator. When functioning as an aid to a human translator, the human may simply select from the various translation hypotheses provided by the system, or he may optionally provide hints or constraints on how to perform one or more of the states of source transduction, hypothesis generation and target transduction.

    2.
    发明专利
    未知

    公开(公告)号:DE69031099D1

    公开(公告)日:1997-09-04

    申请号:DE69031099

    申请日:1990-05-07

    Applicant: IBM

    Abstract: A method of detecting and correcting an error in a string of information signals. When each information signal represents a word, the method detects and corrects spelling errors. The method detects and corrects an error which is a properly spelled word, but which is the wrong (not intended) word. For example, the method is capable of detecting and correcting a misspelling of "HORSE" as "HOUSE" In the spelling error detection and correction method, a first word in an input string of words is changed to form a second word different from a first word to form a candidate string of words. The spellings of the first word and the second word are in the spelling dictionary. The probability of occurrence of the input string of words is compared to the product of the probability of occurrence of the candidate string of words multiplied by the probability of misrepresenting the candidate string of words as the input string of words. If the former is greater than or equal to the latter, no correction is made. If the former is less than the latter, the candidate string of words is selected as a spelling correction.

    SPEECH RECOGNITION SYSTEM
    4.
    发明专利

    公开(公告)号:CA1257697A

    公开(公告)日:1989-07-18

    申请号:CA528791

    申请日:1987-02-02

    Applicant: IBM

    Abstract: SPEECH RECOGNITION SYSTEM Apparatus and method for evaluating the likelihood of a word in a vocabulary of words wherein a total score is evaluated for each word, each total score being the result of combining at least two word scores generated by differing algorithms. In one embodiment, a detailed acoustic match word score is combined with an approximate acoustic match word score to provide a total word score for a subject word. In another embodiment, a polling word score is combined with an acoustic match word score to provide a total word score for a subject word. The acoustic models employed in the acoustic matching may correspond, alternatively, to phonetic elements or to fenemes. Fenemes represent labels generated by an acoustic processor in response to a spoken input. Apparatus and method for determining word scores according to approximate acoustic matching and for determining word scores according to a polling methodology are disclosed.

    AUTOMATIC GENERATION OF SIMPLE MARKOV MODEL STUNTED BASEFORMS FOR WORDS IN A VOCABULARY

    公开(公告)号:CA1238978A

    公开(公告)日:1988-07-05

    申请号:CA504802

    申请日:1986-03-24

    Applicant: IBM

    Abstract: AUTOMATIC GENERATION OF SIMPLE MARKOV MODEL STUNTED BASEFORMS FOR WORDS IN A VOCABULARY The present invention addresses the problem of automatically constructing a phonetic-type baseform which, for a given word, is stunted in length relative to a fenemic baseform for the given word. Specifically, in a system that (i) defines each word in a vocabulary by a fenemic baseform of fenemic phones, (ii) defines an alphabet of composite phones each of which corresponds to at least one fenemic phone, and (iii) generates a string of fenemes in response to speech input, the present invention provides for converting a word baseform comprised of fenemic phones into a stunted word baseform of composite phones by (a) replacing each fenemic phone in the fenemic phone word baseform by the composite phone corresponding thereto; and (b) merging together at least one pair of adjacent composite phones by a single composite phone where the adverse effect of the merging is below a predefined threshold.

    LANGUAGE TRANSLATION APPARATUS AND METHOD USING CONTEXT-BASED TRANSLATION MODELS

    公开(公告)号:CA2125200C

    公开(公告)日:1999-03-02

    申请号:CA2125200

    申请日:1994-06-06

    Applicant: IBM

    Abstract: An apparatus for translating a series of source words in a first language to a series of target words in a second language. For an input series of source words, at least two target hypotheses, each comprising a series of target words, are generated. Each target word has a context comprising at least one other word in the target hypothesis. For each target hypothesis, a language model match score comprises an estimate of the probability of occurrence of the series of words in the target hypothesis. At least one alignment connecting each source word with at least one target word in the target hypothesis is identified. For each source word and each target hypothesis, a word match score comprises an estimate of the conditional probability of occurrence of the source word, given the target word in the target hypothesis which is connected to the source word and given the context in the target hypothesis of the target word which is connected to the source word. For each target hypothesis, a translation match score comprises a combination of the word match scores for the target hypothesis and the source words in the input series of source words. A target hypothesis match score comprises a combination of the language model match score for the target hypothesis and the translation match score for the target hypothesis The target hypothesis having the best target hypothesis match score is output.

    RAPIDLY TRAINING A SPEECH RECOGNIZER TO A SUBSEQUENT SPEAKER GIVEN TRAINING DATA OF A REFERENCE SPEAKER

    公开(公告)号:CA1332195C

    公开(公告)日:1994-09-27

    申请号:CA570927

    申请日:1988-06-30

    Applicant: IBM

    Abstract: RAPIDLY TRAINING A SPEECH RECOGNIZER TO A SUBSEQUENT SPEAKER GIVEN TRAINING DATA OF A REFERENCE SPEAKER Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an ?th label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.

    SPEECH RECOGNITION SYSTEM WITH EFFICIENT STORAGE AND RAPID ASSEMBLY OF PHONOLOGICAL GRAPHS

    公开(公告)号:CA1242028A

    公开(公告)日:1988-09-13

    申请号:CA504805

    申请日:1986-03-24

    Applicant: IBM

    Abstract: SPEECH RECOGNITION SYSTEM WITH EFFICIENT STORAGE AND RAPID ASSEMBLY OF PHONOLOGICAL GRAPHS A continuous speech recognition system is disclosed having a speech processor and a word recognition computer subsystem, charcterized by means associated with the speech processor for developing a graph of confluent links between confluent nodes; means associated with the speech processor for developing a graph of boundary links between adjacent words; means associated with the speech processor for storing an inventory of confluent links and boundary links as a coding inventory; means associated with the speech processor for converting an unknown utterance into an encoded sequence of confluent links and boundary links corresponding to recognition sequences stored in said word recognition subsystem recognition vocabulary for speech recognition. The invention also includes method for achieving continuous speech recognition by characterizing speech as a sequence of confluent links which are matched with candidate words. The invention also applies to isolated word speech recognition as with continuous speech recognition, except that in such case there are no boundary links.

    SPEECH RECOGNITION SYSTEM FOR NATURAL LANGUAGE TRANSLATION

    公开(公告)号:CA2091912A1

    公开(公告)日:1993-11-22

    申请号:CA2091912

    申请日:1993-03-18

    Applicant: IBM

    Abstract: A speech recognition system displays a source text of one or more words in a source language. The system has an acoustic processor for generating a sequence of coded representations of an utterance to be recognized. The utterance comprises a series of one or more words in a target language different from the source language. A set of one or more speech hypotheses, each comprising one or more words from the target language, are produced. Each speech hypothesis is modeled with an acoustic model. An acoustic match score for each speech hypothesis comprises an estimate of the closeness of a match between the acoustic model of the speech hypothesis and the sequence of coded representations of the utterance. A translation match score for each speech hypothesis comprises an estimate of the probability of occurrence of the speech hypothesis given the occurrence of the source text. A hypothesis score for each hypothesis comprises a combination of the acoustic match score and the translation match score. At least one word of one or more speech hypotheses having the best hypothesis scores is output as a recognition result.

Patent Agency Ranking