14.
    发明专利
    未知

    公开(公告)号:DK176383B1

    公开(公告)日:2007-10-22

    申请号:DK438189

    申请日:1989-09-05

    Applicant: MOTOROLA INC

    Inventor: GERSON IRA ALAN

    Abstract: An improved excitation vector generation and search technique (FIG. 1) is described for a code-excited linear prediction (CELP) speech coder (100) using a codebook of excitation code vectors. A set of M basis vectors Vm(n) are used along with the excitation signal codewords (i) to generate the codebook of excitation vectors ui(n) according to a "vector sum" technique (120) of converting the selector codewords into a plurality of interim data signals, multiplying the set of M basis vectors by the interim data signals, and summing the resultant vectors to produce the set of 2M codebook vectors. The entire codebook of 2M possible excitation vectors is efficiently searched by using the vector sum generation technique with the M basis vectors-without ever having to generate and evaluate each of the 2M code vectors themselves. Furthermore, only M basis vectors need to be stored in memory (114), as opposed to all 2M code vectors.

    A method and apparatus for converting text into audible signals using a neural network

    公开(公告)号:AU675389B2

    公开(公告)日:1997-01-30

    申请号:AU2104095

    申请日:1995-03-21

    Applicant: MOTOROLA INC

    Abstract: Text may be converted to audible signals, such as speech, by first training a neural network 106 using recorded audio messages 204. To begin the training, the recorded audio messages are converted into a series of audio frames 205 having a fixed duration 213. Then, each audio frame is assigned a phonetic representation 203 and a target acoustic representation 208, where the phonetic representation 203 is a binary word that represents the phone and articulation characteristics of the audio frame, while the target acoustic representation 208 is a vector of audio information such as pitch and energy. After training, the neural network 106 is used in conversion of text into speech. First, text that is to be convened is translated to a series of phonetic frames 401 of the same form as the phonetic representations 208 and having the fixed duration 213. Then the neural network produces acoustic representations in response to context descriptions 207 that include some of the phonetic frames 401. The acoustic representations are then converted into a speech wave form by a synthesizer 107.

    Optimal method of data reduction in a speech recognition system

    公开(公告)号:HK40596A

    公开(公告)日:1996-03-15

    申请号:HK40596

    申请日:1996-03-07

    Applicant: MOTOROLA INC

    Abstract: The present invention describes a method and arrangement for reducing a sequence of initial frames into a reduced set of representative frames by combining the initial frames into a plurality of representative frames, the combining process including generating a distortion measure associated with each representative frame and comparing each distortion measure to a distortion threshold. From these representative frames, a set of mutually exclusive frames is determined to minimize the number of representative frames, whereby each representative frame in the set represents a unique set of contiguous initial frames and has an associated distortion measure which does not exceed the distortion threshold.

    A Method and Apparatus for Converting Text Into Audible Signals Using a Neural Network

    公开(公告)号:CA2161540A1

    公开(公告)日:1995-11-09

    申请号:CA2161540

    申请日:1995-03-21

    Applicant: MOTOROLA INC

    Abstract: Text may be converted to audible signals, such as speech, by first training a neural network 106 using recorded audio messages 204. To begin the training, the recorded audio messages are converted into a series of audio frames 205 having a fixed duration 213. Then, each audio frame is assigned a phonetic representation 203 and a target acoustic representation 208, where the phonetic representation 203 is a binary word that represents the phone and articulation characteristics of the audio frame, while the target acoustic representation 208 is a vector of audio information such as pitch and energy. After training, the neural network 106 is used in conversion of text into speech. First, text that is to be convened is translated to a series of phonetic frames 401 of the same form as the phonetic representations 208 and having the fixed duration 213. Then the neural network produces acoustic representations in response to context descriptions 207 that include some of the phonetic frames 401. The acoustic representations are then converted into a speech wave form by a synthesizer 107.

    18.
    发明专利
    未知

    公开(公告)号:DE3853294D1

    公开(公告)日:1995-04-13

    申请号:DE3853294

    申请日:1988-08-24

    Applicant: MOTOROLA INC

    Abstract: A reliable method for terminating a telephone call is disclosed using a specific sequence of steps performed by hands-free control system. The invention requires that the call terminating command sequence be recognized as: two separate speech utterances (e.g., TERMINATE (158) and CONVERSATION (158)); in proper sequence (e.g. TERMINATE first, then CONVERSATION) with a maximum pause time interval (124) between the end of the first utterance and the start of the second utterance (e.g., 300 milliseconds) and which meet predefined speech recognition matching criteria (110). Moreover, the present invention provides the user with a procedure to continue the telephone call in progress should the speech recognizer make a false recognition or if the user did not intend to speak the proper command. As a result, present invention enables a user to disconnect a telephone call by voice command with a high degree of reliability, even under high ambient noise conditions.

    20.
    发明专利
    未知

    公开(公告)号:DE3688747D1

    公开(公告)日:1993-08-26

    申请号:DE3688747

    申请日:1986-12-18

    Applicant: MOTOROLA INC

    Abstract: The present invention describes a method and arrangement for reducing a sequence of initial frames into a reduced set of representative frames by combining the initial frames into a plurality of representative frames, the combining process including generating a distortion measure associated with each representative frame and comparing each distortion measure to a distortion threshold. From these representative frames, a set of mutually exclusive frames is determined to minimize the number of representative frames, whereby each representative frame in the set represents a unique set of contiguous initial frames and has an associated distortion measure which does not exceed the distortion threshold.

Patent Agency Ranking