Abstract:
PROBLEM TO BE SOLVED: To solve the problem wherein although the performance of a voice recognition device deteriorates significantly in the circumstances in which there exists long reverberation, which is generally known, and most of the conventional reverberation removal methods require a large amount of calculation is not large, or for those where the amount of calculation is not large, some kind of previous knowledge (reverberation time of a room, etc.) is required. SOLUTION: The coefficient determination in the conventional techniques, in which the multiple value of the coefficient of power spectrum of the past frame is subtracted from the power spectrum of the current frame is calculated at low cost, without having to use the information that incurs calculation cost, such as acoustic model or multi-channel input. As a specific method, a voice power track that properly follows the frame of large power and follows the frame of small power late is obtained, and the interval of which the voice power differs significantly from the voice power of the current frame that is smoothed in the time direction is deduced as being an utterance terminal reverberation interval, and the filter coefficient is decided, in such a manner as to minimize the weighted total sum of the residual voice power in the interval and the subtracted power in the utterance interval (not including the reverberation interval). COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a method of specifying speakers of individual voices from recorded voices of a plurality of speakers with simple device constitution, and a system using the same method. SOLUTION: The system is equipped with microphones 10 which are provided by the speakers, a speech processing section 20 which imparts unique characteristics to speech signals of two channels recorded by the microphones 10 through mutually different speech processes and mixes the signals by the channels, and an analysis section 40 which takes analyses corresponding to the unique characteristics imparted to the speech signals by the microphones 10 through the processes of the speech process section 20 to specify speakers by utterance sections of the speech signals. COPYRIGHT: (C)2006,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To make it possible to estimate the sound source position that has been difficult for the conventional system using a small number of microphones, so as to improve the conventional estimating accuracy of the sound source position. SOLUTION: It is made possible to estimate the sound source position by forming a reflection surface RS as a enveloping surface of spheroid using the location of a collection means and the sound source location as the focal points, by generating main reflected waves with the amount of delay corresponding to the sound source location, and by inspecting the amount of delay between the direct wave and the reflected wave so as to acquire the sound source location and to make estimable the location. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To provide a vocabulary prediction method which can raise the accu racy of prediction. SOLUTION: Using string of a partial parse tree covering the word string to the time in each step of word prospect, the partial parse tree having a useful information is selected by the prospect of the next word. In other word, the accuracy of the prospect is improved by selecting the most useful word and/or word string for the prospect of the next word based upon the relation structure and the word string to be a history. Specifying the partial parse tree relating to a word to be prospected, thereafter, the next word is prospected from the partial parse tree, that is, the word and/or word string estimated to be a relation having the connection with the word of prospective object.
Abstract:
PROBLEM TO BE SOLVED: To provide a technique for performing speech recognition using an acoustic invariant structure for a large vocabulary continuous speech.SOLUTION: An information processing device 100 includes: a speech recognition processing unit for receiving a speech as input, performing speech recognition, and outputting speech recognition scores together with a plurality of hypotheses which result from the recognition; a structure score calculation unit for calculating a structure score that is a score obtained for each hypothesis by considering all phoneme pairs constituting the hypothesis and summing up phoneme pair inter-distribution distance likelihoods multiplied by phoneme pair-by-pair weights; and a ranking unit for re-ranking the multiple hypotheses on the basis of the sum of the speech recognition score and the structure score.
Abstract:
PROBLEM TO BE SOLVED: To provide a method and system for detecting a position of a user of a home television game machine. SOLUTION: A speaker 506 mounted in a remote controller is used to reproduce a signal of a predetermined reproduced sound, the reproduced sound is observed respectively by two microphones properly provided in the vicinity of a television screen, CSP (while mutual correlation) coefficients of a signal of an observation sound respectively observed and the signal of the reproduced sound are calculated, and distances between the speaker inside the remote controller and the microphones are calculated, thereby acquiring longitudinal and lateral absolute positions of the remote controller with respect to a microphone array. An interference sound of an environmental sound or noise is canceled by the correlation calculation. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a method, a means and a program for high accuracy speech recognition and naturally synthesized speech, output in a language having large variations in the speech tone. SOLUTION: A statistic model is learned, by observing F0 tilt by using a linear approximation method or a global smoothing method, of F0 of a start point and an end point of a phoneme, and the F0 tilt is evaluated in runtime, and synthesis speech in which the F0 is corrected, based on cost calculation is output. Time change of the F0 tilt in a syllable is modeled, by learning a decision tree for each region into which the syllable is suitably and equally divided. Likelihood is evaluated by estimating an error range in the observed F0 tilt. By linking these operations, high-accuracy speech recognition and natural tone synthesis speech output are obtained. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a voice processing technique attaining stable voice recognition even in noise. SOLUTION: A high-order term and a low-order term of cepstrum of an observation voice are cut to design a filter directly from the observation voice itself. The filter is thereby made a filter with weight at a harmonic structure part in a section of a voiced sound, and a filter close to flat in a section of voiceless sound without the harmonic structure. Since this change is continuous, stable processing can be performed without distinguishing the voiced sound section from the voiceless sound section. COPYRIGHT: (C)2009,JPO&INPIT