Abstract:
A speech coder and decoder methodology wherein pitch excitation and codebook excitation source energies are represented by parameters that are readily transmissible with minimal transmission capacity requirements. The parameters are the long term energy value, a short term correction factor which is applied to the long term energy value to match the short term energy, and proportionality factor(s) that specify the relative energy contribution of the excitation sources to the short term energy value.
Abstract:
A speech coder and decoder methodology wherein pitch excitation and codebook excitation source energies are represented by parameters that are readily transmissible with minimal transmission capacity requirements. The parameters are the long term energy value, a short term correction factor which is applied to the long term energy value to match the short term energy, and proportionality factor(s) that specify the relative energy contribution of the excitation sources to the short term energy value.
Abstract:
A user-interactive speech recognition control system is disclosed for recognizing a complete sequence of keywords (e.g., a telephone number such as 123-4567) via entering, verifying, and editing variable-length utterance strings (e.g., 1-2-3; 4-5; 6-7) separated by the user-defined placement of pauses. The device controller (120) utilizes timers (124) to monitor the pause time between partial-sequence digit strings recognized by the speech recognizer (110). When a string of digits is followed by a predetermined pause time interval, the recognized digits will be replied via the speech synthesizer (130). An additional string of digits can then be entered, and only the subsequent string will be replied after the next pause. Furthermore, the user has the flexibility to correct only the last digit string entered, or the entire sequence. Hence, if there is an error in only one digit, the erroneous digit string can be corrected without having to re-enter the entire digit sequence. The invention is well-suited to be used in a hands-free voice command dialing system for a mobile radiotelephone, wherein vehicular background noise may affect recognition accuracy.
Abstract:
Described herein, is an arrangement and method for processing speech information in a speech recognition system (300). In such a system where the speech information is depicted as words, each word representing a sequence of frames (510) and where the recognition system has means (120) for comparing present input speech to a word template, the word template stored in template memory and derived from one or more previous input word, the present invention is best employed. The invention describes combining contiguous acoustically similar frames (512) derived from the previous input word or words into representative frames to form a corresponding reduced word template, storing the reduced word template in template memory in an efficient manner, and comparing frames of the present input speech to the representative frames of the reduced word template according to the number of frames combined in the representative frames of the reduced word template. In doing so, a measure of similarity between the present input speech and the word template is generated.
Abstract:
A speech coder and decoder methodology wherein pitch excitation and codebook excitation source energies are represented by parameters that are readily transmissible with minimal transmission capacity requirements. The parameters are the long term energy value, a short term correction factor which is applied to the long term energy value to match the short term energy, and proportionality factor(s) that specify the relative energy contribution of the excitation sources to the short term energy value.
Abstract:
A reliable method for terminating a telephone call is disclosed using a specific sequence of steps performed by hands-free control system. The invention requires that the call terminating command sequence be recognized as: two separate speech utterances (e.g., TERMINATE (158) and CONVERSATION (158)); in proper sequence (e.g. TERMINATE first, then CONVERSATION) with a maximum pause time interval (124) between the end of the first utterance and the start of the second utterance (e.g., 300 milliseconds) and which meet predefined speech recognition matching criteria (110). Moreover, the present invention provides the user with a procedure to continue the telephone call in progress should the speech recognizer make a false recognition or if the user did not intend to speak the proper command. As a result, present invention enables a user to disconnect a telephone call by voice command with a high degree of reliability, even under high ambient noise conditions.
Abstract:
A reliable method for terminating a telephone call is disclosed using a specific sequence of steps performed by hands-free control system. The invention requires that the call terminating command sequence be recognized as: two separate speech utterances (e.g., TERMINATE (158) and CONVERSATION (158)); in proper sequence (e.g. TERMINATE first, then CONVERSATION) with a maximum pause time interval (124) between the end of the first utterance and the start of the second utterance (e.g., 300 milliseconds) and which meet predefined speech recognition matching criteria (110). Moreover, the present invention provides the user with a procedure to continue the telephone call in progress should the speech recognizer make a false recognition or if the user did not intend to speak the proper command. As a result, present invention enables a user to disconnect a telephone call by voice command with a high degree of reliability, even under high ambient noise conditions.