Abstract:
In a speech synthesis apparatus for outputting synthesized speech on the basis of a parameter sequence of a speech waveform, a parameter generation unit generates a parameter sequence for speech synthesis on the basis of a character sequence input by a character sequence input unit, and stores the generated parameter sequence in a parameter storage unit. A waveform generation unit generates pitch waveforms each for one pitch period on the basis of synthesis parameters and pitch scales included in the parameter sequence, and generates a speech waveform by connecting the generated pitch waveforms in accordance with frame lengths set by a frame length setting unit.
Abstract:
There is provided a speech synthesizer including first indication means for indicating the amplitude of an impulse response waveform by using a random number; second indication means for indicating the superposition period for impulse response waveforms by using a random number; impulse response waveform generating means for generating an impulse response waveform having an amplitude indicated by the first indication means; and waveform superposition means for synthesizing an unvoiced speech waveform by superposing an impulse response waveform generated by the impulse response waveform generating means onto an impulse response waveform obtained by delaying the first-mentioned impulse response waveform by a superposition period indicated by the second indication means, the speech synthesizer being capable of making the frequency characteristic at the unvoiced speech section analogous to that of white noises, and generating a synthesized speech natural and analogous to human actual voice.
Abstract:
Speech including a speech portion and a non-speech portion is inputted, a Cepstrum long time mean of the speech portion is obtained from the speech portion of the input speech, a Cepstrum long time mean of the non-speech portion is obtained from the non-speech portion of the input speech, each Cepstrum long time mean is converted from a Cepstrum region to a linear region, and after that, it is subtracted on a linear spectrum dimension, the subtracted mean is converted into a Cepstrum dimension, a Cepstrum long time mean of a speech portion in a speech database for learning is subtracted from the converted result, and the subtracted result is added to a speech model expressed by Cepstrum. Thus, even when a noise is large, a presuming precision of a line fluctuation is raised and a recognition rate can be improved.
Abstract:
In a speech synthesis apparatus for outputting synthesized speech on the basis of a parameter sequence of a speech waveform, a parameter generation unit generates a parameter sequence for speech synthesis on the basis of a character sequence input by a character sequence input unit, and stores the generated parameter sequence in a parameter storage unit. A waveform generation unit generates pitch waveforms each for one pitch period on the basis of synthesis parameters and pitch scales included in the parameter sequence, and generates a speech waveform by connecting the generated pitch waveforms in accordance with frame lengths set by a frame length setting unit.
Abstract:
Speech including a speech portion and a non-speech portion is inputted, a Cepstrum long time mean of the speech portion is obtained from the speech portion of the input speech, a Cepstrum long time mean of the non-speech portion is obtained from the non-speech portion of the input speech, each Cepstrum long time mean is converted from a Cepstrum region to a linear region, and after that, it is subtracted on a linear spectrum dimension, the subtracted mean is converted into a Cepstrum dimension, a Cepstrum long time mean of a speech portion in a speech database for learning is subtracted from the converted result, and the subtracted result is added to a speech model expressed by Cepstrum. Thus, even when a noise is large, a presuming precision of a line fluctuation is raised and a recognition rate can be improved.
Abstract:
Speech including a speech portion and a non-speech portion is inputted, a Cepstrum long time mean of the speech portion is obtained from the speech portion of the input speech, a Cepstrum long time mean of the non-speech portion is obtained from the non-speech portion of the input speech, each Cepstrum long time mean is converted from a Cepstrum region to a linear region, and after that, it is subtracted on a linear spectrum dimension, the subtracted mean is converted into a Cepstrum dimension, a Cepstrum long time mean of a speech portion in a speech database for learning is subtracted from the converted result, and the subtracted result is added to a speech model expressed by Cepstrum. Thus, even when a noise is large, a presuming precision of a line fluctuation is raised and a recognition rate can be improved.
Abstract:
Disclosed is a method and apparatus for reading out a feature parameter and a driver sound source stored in a VCV (vowel-consonant-vowel) speech segment file, sequentially connecting the readout parameter and the readout sound source information in accordance with a predetermined rule, and supplying connected data to a speech synthesizer, thereby generating a speech output, including a memory for storing an average power of each vowel, and a power controller for controlling to normalize the VCV segment so that powers at both ends of each VCV segment coincide with the average power of each vowel.
Abstract:
An object of the invention is to provide a method of generating a state transition model capable of high speed voice recognition and to provide a voice recognition method and apparatus using the state transition model. To this end, a method is provided which generates a state transition model in which a state shared structure of the state transition model is designed, the method including a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.
Abstract:
An object of the invention is to provide a method of generating a state transition model capable of high speed voice recognition and to provide a voice recognition method and apparatus using the state transition model. To this end, a method is provided which generates a state transition model in which a state shared structure of the state transition model is designed, the method including a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.