Abstract:
PROBLEM TO BE SOLVED: To extract only voice of a target person under noise environment, without requiring a large scale microphone array and a reference signal of noise. SOLUTION: An object sound extraction method is disclosed in which a practical speech recognition performance is actualized only by performing gain adjustment between spectrum subtraction (SS) processing and flooring processing, as processing for two channel input speech which is obtained from the microphones 1 and 2 etc. As the gain adjustment, a CSP (Cross-power Spectrum Phase) coefficient, which is cross-correlation between two channel signals, can be utilized. In an indoor environment including a vehicle where audio background sound etc., a recognition rate of a voice command in a car navigation system is improved, then, usability of a speaker such as a driver is improved. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To solve the problem wherein although the performance of a voice recognition device deteriorates significantly in the circumstances in which there exists long reverberation, which is generally known, and most of the conventional reverberation removal methods require a large amount of calculation is not large, or for those where the amount of calculation is not large, some kind of previous knowledge (reverberation time of a room, etc.) is required. SOLUTION: The coefficient determination in the conventional techniques, in which the multiple value of the coefficient of power spectrum of the past frame is subtracted from the power spectrum of the current frame is calculated at low cost, without having to use the information that incurs calculation cost, such as acoustic model or multi-channel input. As a specific method, a voice power track that properly follows the frame of large power and follows the frame of small power late is obtained, and the interval of which the voice power differs significantly from the voice power of the current frame that is smoothed in the time direction is deduced as being an utterance terminal reverberation interval, and the filter coefficient is decided, in such a manner as to minimize the weighted total sum of the residual voice power in the interval and the subtracted power in the utterance interval (not including the reverberation interval). COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a method of specifying speakers of individual voices from recorded voices of a plurality of speakers with simple device constitution, and a system using the same method. SOLUTION: The system is equipped with microphones 10 which are provided by the speakers, a speech processing section 20 which imparts unique characteristics to speech signals of two channels recorded by the microphones 10 through mutually different speech processes and mixes the signals by the channels, and an analysis section 40 which takes analyses corresponding to the unique characteristics imparted to the speech signals by the microphones 10 through the processes of the speech process section 20 to specify speakers by utterance sections of the speech signals. COPYRIGHT: (C)2006,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To make it possible to estimate the sound source position that has been difficult for the conventional system using a small number of microphones, so as to improve the conventional estimating accuracy of the sound source position. SOLUTION: It is made possible to estimate the sound source position by forming a reflection surface RS as a enveloping surface of spheroid using the location of a collection means and the sound source location as the focal points, by generating main reflected waves with the amount of delay corresponding to the sound source location, and by inspecting the amount of delay between the direct wave and the reflected wave so as to acquire the sound source location and to make estimable the location. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To provide technology for extracting objective voice by efficiently suppressing mixing of other voice than objective voice, in a plurality pieces of voice which come from different directions. SOLUTION: The objective voice is extracted by performing at least either gain adjustment processing and segmentation processing of an utterance section, on a voice signal obtained by each of first and second voice input units which are arranged with a predetermined distance apart, by using a weighted Cross-Power Spectrum Phase (CSP) coefficient which becomes a small value in a frequency band which is likely to be influenced by other voice than the objective voice. COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To allow a user who reads a Web page by Internet connection equipment to which any printer is not connected, for example, a portable telephone 19 to smoothly obtain the hard copy of the Web page. SOLUTION: An icon 52 for requesting facsimile transmission is displayed in a Web page. When a user clicks the icon 52 for requesting the facsimile transmission, a facsimile server 12 is informed of the URL of the Web page, and the picture of a display part 51 is switched to the Web page of the facsimile server 12. Then, a user inputs a membership number for charging and the facsimile number of the destination of facsimile transmission on this switched picture. The facsimile server 12 performs access to the communicated URL, and generates data for facsimile output from the Web page, and transmits the data to facsimile equipment 36 at the destination of facsimile transmission.
Abstract:
PROBLEM TO BE SOLVED: To provide a method and system for detecting a position of a user of a home television game machine. SOLUTION: A speaker 506 mounted in a remote controller is used to reproduce a signal of a predetermined reproduced sound, the reproduced sound is observed respectively by two microphones properly provided in the vicinity of a television screen, CSP (while mutual correlation) coefficients of a signal of an observation sound respectively observed and the signal of the reproduced sound are calculated, and distances between the speaker inside the remote controller and the microphones are calculated, thereby acquiring longitudinal and lateral absolute positions of the remote controller with respect to a microphone array. An interference sound of an environmental sound or noise is canceled by the correlation calculation. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a voice processing technique attaining stable voice recognition even in noise. SOLUTION: A high-order term and a low-order term of cepstrum of an observation voice are cut to design a filter directly from the observation voice itself. The filter is thereby made a filter with weight at a harmonic structure part in a section of a voiced sound, and a filter close to flat in a section of voiceless sound without the harmonic structure. Since this change is continuous, stable processing can be performed without distinguishing the voiced sound section from the voiceless sound section. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a novel method for noise reduction applied to a speech recognition front-end.SOLUTION: An output of a front-end 3000 is optimized by giving, as a weight to the output for each band, a confidence index representing the remarkableness of the harmonic structure of observation speech. In a first method, when clean speech is estimated by executing MMSE estimation on a model that gives a probability distribution of noise-removed speech generated from observation speech, the posterior probability of the MMSE estimation is weighted using the confidence index as a weight. In a second method, linear interpolation is executed, for each band, between an observed value of observation speech and an estimated value of clean speech, with the confidence index serving as a weight. The first method and the second method can be combined.
Abstract:
PROBLEM TO BE SOLVED: To provide a highly accurate voice activity detection method in a low S/N environment. SOLUTION: The voice activity is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech included in the speech signal by using the long-term spectrum variation component feature, or a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection. COPYRIGHT: (C)2009,JPO&INPIT