Abstract:
A method and apparatus for protecting compressed speech in very low bit rate voice messaging comprising the steps of analyzing compressed input speech data to discriminate between the data such as heading, pitch, energy, spectral and timing information (506); providing a plurality of methods for channel encoding wherein the most error sensitive information for speech replication is provided a coding method with the greatest protection and sequentially less error sensitive information is encoded utilizing methods of sequentially less protection (510), thereby providing for significantly less overhead in the overall channel encoding process than would be present under a standard channel encoding scheme; passing the output of each encoder to a multiplexor (512), which multiplexes the plurality of channel encoded data and sends the encoded data via a transmission channel to a de-multiplexer wherein the channel encoded data is separated and passed to a plurality of decoders designed to decode the data of its paired encoder; passing the decoded data to an analog to digital converter wherein the digital data is converted to analog data; and passing the analog data to a speech synthesizer which replicates the input speech.
Abstract:
There is described a method (200) for text to speech synthesis, the method (200) includes receiving (220) a text string and selecting at least one word from the string. Then a step of segmenting (240) the word into a sub-words forming a sub-word sequence with at least one of the sub-words comprising at least two letters. The step of identifying (250) provides for identifying phonemes for the sub-words and step (260) effects concatenating the phonemes into a phoneme sequence. A performing speech synthesis (280) on the phoneme sequence is then conducted.
Abstract:
A method and apparatus is provided for a low bit rate speech transmission. Speech spectral parameter vectors are generated from a voice message and stored in a sequence of speech spectral parameter vectors within a speech spectral parameter matrix. A first index identifying a first speech parameter template corresponding to a first speech spectral parameter vector of the sequence of speech spectral parameter vectors is transmitted. A subsequent speech spectral parameter vector of the sequence is selected and a subsequent speech parameter template is determined having a subsequent index. One or more intervening interpolated speech parameter templates are interpolated between the first speech parameter template and the subsequent speech parameter template. The one or more intervening speech spectral parameter vectors are compared to the corresponding one or more intervening interpolated speech parameter templates to derive a distance. The subsequent index is transmitted when the distance derived is less than or equal to a predetermined distance.
Abstract:
There is described a method (200) for providing a representation of a waveform for a word. The method (200) includes providing (220) transcriptions representing phrases and corresponding sampled and digitized utterance waveforms of the transcriptions, the transcriptions having marked natural phrase boundaries. The method (200) also provides for clustering (230) parts of the waveforms corresponding to identical words in the transcriptionst to provide groups of waveforms for the identical words with similar prosodic features, the clustering being effected when the identical words are positioned in the transcriptions at locations relative to natural phrase boundaries. Then processing each of the groups of waveforms for the identical words to provide a representative utterance waveform for each other 240.
Abstract:
Error detection and correction of a received message, such as a digitized voice message is achieved by generating (318) interpolated vectors for each error vector corresponding to a codebook index in a sequence of codebook indexes representing parameters of portions of the message. A plurality of error corrected candidate vectors for the vector corresponding to the codebook index in error, are generated (322, 324, 326) by flipping one bit in a sequence of bits representing the codebook index in error. The error corrected candidate vector which has a minimal difference from its corresponding interpolated vector is used (338) to replace the error vector. In the case of digital voice, the vectors are spectral vectors which represent spectral information for a time sample of a voice message. An ordering property of vector components is exploited to detect errors in a received codebook index without parity bits.
Abstract:
A computationally non-intensive method for classifying real-time speech data is useful for improved animations of avatars. The method includes identifying a voiced speech segment of the speech data (step 410). A high-amplitude spectrum is then determined by performing a spectral analysis on a high-amplitude component of the voiced speech segment (step 415). The high-amplitude spectrum is then classified as a vowel phoneme, where the vowel phoneme is selected from a reduced vowel set (step 440).
Abstract:
There is described a method (200) for providing a representation of a waveform for a word. The method (200) includes providing (220) transcriptions representing phrases and corresponding sampled and digitized utterance waveforms of the transcriptions, the transcriptions having marked natural phrase boundaries. The method (200) also provides for clustering (230) parts of the waveforms corresponding to identical words in the transcriptionst to provide groups of waveforms for the identical words with similar prosodic features, the clustering being effected when the identical words are positioned in the transcriptions at locations relative to natural phrase boundaries. Then processing each of the groups of waveforms for the identical words to provide a representative utterance waveform for each other 240.
Abstract:
A method and system for compressing handwritten character templates. The system includes a codebook generator module (105) for generating a codebook (125). The codebook (125) includes vectors defining the centers of clusters (115) of uncompressed model character feature vectors (110) provided from model character templates. A template compression module (120) is connected to the codebook generator module (105) for comparing the uncompressed model character feature vectors (110) with the codebook (125) to provide compressed templates of model characters (135). Optionally, a template matching module (140) is connected to the template compression module (120) for providing candidate characters (150) by comparing the distances between uncompressed input character feature vectors (130) and the model character templates.
Abstract:
An apparatus and method for processing a voice message to provide low bit rate speech transmission processes the voice message to generate speech parameters which are arranged into a two dimensional parameter matrix (502) including a sequence of parameter frames. The two dimensional parameter matrix (502) is transformed using a predetermined two dimensional matrix transformation function (414) to obtain a two dimensional transform matrix (506). Distance values representing distances between templates of a set of predetermined templates and the two dimensional transform matrix (506) are then derived. The distance values derived are identified by indexes identifying the templates of the set of predetermined templates. The distance values derived are compared, and an index corresponding to a template of the set of predetermined templates having a shortest distance is selected and then transmitted.
Abstract:
There is described a method (200) for text to speech synthesis, the method (200) includes receiving (220) a text string and selecting at least one word from the string. Then a step of segmenting (240) the word into a sub-words forming a sub-word sequence with at least one of the sub-words comprising at least two letters. The step of identifying (250) provides for identifying phonemes for the sub-words and step (260) effects concatenating the phonemes into a phoneme sequence. A performing speech synthesis (280) on the phoneme sequence is then conducted.