Abstract:
A method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor.
Abstract:
A method of recognizing speech, comprising:
receiving input data indicative of the speech to be recognized; detecting pauses in the speech, based on the input data, to identify a phrase duration; generating a plurality of phrase hypotheses representative of likely word phrases represented by the input data between the pauses detected; comparing a word duration associated with each word in each phrase hypothesis, based on a number of words in the phrase hypothesis and based on the phrase duration, with an expected word duration for a phrase having a number of words equal to the number of words in the phrase hypothesis; and assigning a score to each phrase hypothesis based on the comparison of the word duration with the expected word duration to obtain a most likely phrase hypothesis represented by the input data.
Abstract:
A method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor.
Abstract:
A method of recognizing speech, comprising:
receiving input data indicative of the speech to be recognized; detecting pauses in the speech, based on the input data, to identify a phrase duration; generating a plurality of phrase hypotheses representative of likely word phrases represented by the input data between the pauses detected; comparing a word duration associated with each word in each phrase hypothesis, based on a number of words in the phrase hypothesis and based on the phrase duration, with an expected word duration for a phrase having a number of words equal to the number of words in the phrase hypothesis; and assigning a score to each phrase hypothesis based on the comparison of the word duration with the expected word duration to obtain a most likely phrase hypothesis represented by the input data.
Abstract:
A computer system for linearly encoding a pronunciation prefix tree. The pronunciation prefix tree has nodes such that each non-root and non-leaf node represents a phoneme and wherein each leaf node represents a word formed by the phonemes represented by the non-leaf nodes in a path from the root node to the leaf node. Each leaf node has a probability associated with the word of the leaf node. The computer system creates a tree node dictionary containing an indication of the phonemes that compose each word. The computer system then orders the child nodes of each non-leaf node based on the highest probability of descendent leaf nodes of the child node. Then, for each non-leaf node, the computer system sets the probability of the non-leaf node to a probability based on the probability of its child nodes, and for each node, sets a factor of the node to the probability of the node divided by the probability of the parent node of the node. Finally, the computer system generates an encoded pronunciation entry for each leaf node of the pronunciation prefix tree. Each encoded pronunciation entry indicates the word represented by the leaf node and contains the factor of a nearest ancestor node with a factor other than 1.0.
Abstract:
A method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor.
Abstract:
A computer system for linearly encoding a pronunciation prefix tree. The pronunciation prefix tree has nodes such that each non-root and non-leaf node represents a phoneme and wherein each leaf node represents a word formed by the phonemes represented by the non-leaf nodes in a path from the root node to the leaf node. Each leaf node has a probability associated with the word of the leaf node. The computer system creates a tree node dictionary containing an indication of the phonemes that compose each word. The computer system then orders the child nodes of each non-leaf node based on the highest probability of descendent leaf nodes of the child node. Then, for each non-leaf node, the computer system sets the probability of the non-leaf node to a probability based on the probability of its child nodes, and for each node, sets a factor of the node to the probability of the node divided by the probability of the parent node of the node. Finally, the computer system generates an encoded pronunciation entry for each leaf node of the pronunciation prefix tree. Each encoded pronunciation entry indicates the word represented by the leaf node and contains the factor of a nearest ancestor node with a factor other than 1.0.