Abstract:
An apparatus and method for processing a voice message to provide low bit rate speech transmission processes the voice message to generate speech parameters which are arranged into a two dimensional parameter matrix (502) including a sequence of parameter frames. The two dimensional parameter matrix (502) is transformed using a predetermined two dimensional matrix transformation function (414) to obtain a two dimensional transform matrix (506). Distance values representing distances between templates of a set of predetermined templates and the two dimensional transform matrix (506) are then derived. The distance values derived are identified by indexes identifying the templates of the set of predetermined templates. The distance values derived are compared, and an index corresponding to a template of the set of predetermined templates having a shortest distance is selected and then transmitted.
Abstract:
A method for allowing two screen functions to share a common display are (35) of an interactive display screen (5) associated with an electronic device (1). The method includes determining automatically (21) a required function to be performed in the common display are (35). The required function typically being a character scribing function (25) and a candidate character display function (24) associated with a scribed character scribed on the common display are (35). The method performs controlling the common display are (35) in order to perform only the required function.
Abstract:
A speech analyzer (107) compresses a voice message for transmission and includes an LPC analyzer (406) which derives spectral vectors from segments of speech; a memory (1910) which stores predetermined spectral vectors identified by indexes, the indexes also identifying predetermined voicing vectors stored within a receiver; a quantizer (422) which compares the spectral vector derived with the predetermined spectral vectors to select one of the predetermined spectral vectors; and an output buffer for storing the index identifying the predetermined spectral vector selected. The speech analyzer (107) also includes a pitch determiner (414) which includes a pitch function generator (414) which generates a pitch function from a segment of speech. A pitch enhancer (1116) enhances the pitch function of a current segment of speech utilizing the pitch function of one or more sequential segments of speech and a pitch detector (1118) detects the pitch of the current segment of speech.
Abstract:
An electronic device (200) for speech dialog includes functions that receive (205, 105) an utterance that includes an instantiated variable (215), perform voice recognition (210, 115, 120) of the instantiated variable to determine a most likely set of acoustic states (220) and a corresponding sequence of phonemes with stress information (215), determine prosodic characteristics (272, 274, 276, 130) for a synthesized value of the instantiated variable (236) from the sequence of phonemes with stress information and a set of stored prosody models. The electronic device generates (335, 140) a synthesized value of the instantiated variable using the most likely set of acoustic states and the prosodic characteristics of the instantiated variable.
Abstract:
When scribing characters into an electronic device (1) for displaying at a current character position (24), a character recognition package seeks to recognize the scribed character (22) and produces a first list (50) of candidate characters. The candidate characters in the10 first list (50) is put into an initial order based on the degree of similarity between the scribed character and the candidate characters. Additionally a lexicon of possible character pairs is consulted to determine a second list (52) of candidate characters, based on the immediately preceding character to the current character position. The two lists are compared and the first list is displayed in a display order which may or may not differ from the initial order, depending on the degree of overlap between the two lists. The invention is particularly useful when scribing complex characters such as Chinese characters, and/or when used in devices with a limited memory, for example pocket devices, such as mobile telephones, personal digital assistants (PDAs), global positioning system (GPS) navigators, or the like.
Abstract:
A method and apparatus are provided for a low bit rate speech transmission. Speech spectral parameter vectors are generated from a voice message and stored in a sequence of speech spectral parameter vectors within a speech spectral parameter matrix (602). A first index identifying a first speech parameter template (614) corresponding to a first speech spectral parameter vector (604) of the sequence of speech spectral parameter vectors is transmitted. A subsequent speech spectral parameter vector (608) of the sequence is selected and a subsequent speech parameter template (618) is determined having a subsequent index. One or more intervening interpolated speech parameter templates (620) are interpolated between the first speech parameter template (614) and the subsequent speech parameter template (618). The one or more intervening speech spectral parameter vectors (606) are compared to the corresponding one or more intervening interpolated speech parameter templates (620) to derive a distance. The subsequent index is transmitted when the distance derived is less than or equal to a predetermined distance.
Abstract:
A method for animating an image is useful for animating avatars using real-time speech data. According to one aspect, the method includes identifying an upper facial part and a lower facial part of the image (step 705); animating the lower facial part based on speech data that are classified according to a reduced vowel set (step 710); tilting both the upper facial part and the lower facial part using a coordinate transformation model (step 715); and rotating both the upper facial part and the lower facial part using an image warping model (step 720).
Abstract:
Error detection and correction of a received message, such as a digitized voice message is achieved by generating (318) interpolated vectors for each error vector corresponding to a codebook index in a sequence of codebook indexes representing parameters of portions of the message. A plurality of error corrected candidate vectors for the vector corresponding to the codebook index in error, are generated (322, 324, 326) by flipping one bit in a sequence of bits representing the codebook index in error. The error corrected candidate vector which has a minimal difference from its corresponding interpolated vector is used (338) to replace the error vector. In the case of digital voice, the vectors are spectral vectors which represent spectral information for a time sample of a voice message. An ordering property of vector components is exploited to detect errors in a received codebook index without parity bits.
Abstract:
There is described a method (200) for text to speech synthesis, the method (200) includes receiving (220) a text string and selecting at least one word from the string. Then a step of segmenting (240) the word into a sub-words forming a sub-word sequence with at least one of the sub-words comprising at least two letters. The step of identifying (250) provides for identifying phonemes for the sub-words and step (260) effects concatenating the phonemes into a phoneme sequence. A performing speech synthesis (280) on the phoneme sequence is then conducted.
Abstract:
A method (20) for guiding a user of an electronic device (1) to select keys on a keyboard (31) of a touch screen (5) on the device, the method includes receiving (22) a reference alphanumeric character, input at the keyboard (32), the reference alphanumeric character identifying a first part of a syllable. The method then performs a searching (23) a database of valid syllables or words to identify valid alphanumeric characters that can immediately follow the reference alphanumeric character. Thereafter, the method performs a step of emphasizing keys (24) on the keyboard (32) that represent the valid alphanumeric characters thereby guiding the user to select one of the keys representing one of said valid alphanumeric characters.