Abstract:
PROBLEM TO BE SOLVED: To provide technology for extracting objective voice by efficiently suppressing mixing of other voice than objective voice, in a plurality pieces of voice which come from different directions. SOLUTION: The objective voice is extracted by performing at least either gain adjustment processing and segmentation processing of an utterance section, on a voice signal obtained by each of first and second voice input units which are arranged with a predetermined distance apart, by using a weighted Cross-Power Spectrum Phase (CSP) coefficient which becomes a small value in a frequency band which is likely to be influenced by other voice than the objective voice. COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a technique for extracting features even more robust to reverberations, noises, and the like from a speech signal.SOLUTION: A speech feature extraction apparatus is configured to: receive, as an input, values obtained by adding a spectrum of each frame of a speech signal segmented into frames to an average spectrum that is the average of spectra over all frames that are overall speech; and, for each frame, multiply said values by weights of a mel filter bank to sum up the products, apply the discrete cosine transform to the logarithm of the sum, and calculate, and define as a delta feature, the difference in the discrete cosine transform between former and later frames.
Abstract:
PROBLEM TO BE SOLVED: To provide a technology capable of detecting an ingressive in a voice signal with a high detection rate and a high degree of accuracy.SOLUTION: An ingressive detection device refers to each acoustic model of ingressive and non-ingressive for determining an ingressive candidate and generates a feature vector with setting simplex information meaning information on ingressive candidate simplex, and context information as an element. The context information means information on a relation between the ingressive candidate and a speech section including the ingressive candidate, a relation between the ingressive candidate and an ingressive candidate before and after the ingressive candidate or both relations. The ingressive detection device obtains classification reference information for classifying the ingressive candidate into either the ingressive or the non-ingressive, through machine learning with setting the feature vector as input, and classifies the ingressive candidate into either the ingressive or the non-ingressive based on the classification reference information.
Abstract:
PROBLEM TO BE SOLVED: To extract only voice of a target person under noise environment, without requiring a large scale microphone array and a reference signal of noise. SOLUTION: An object sound extraction method is disclosed in which a practical speech recognition performance is actualized only by performing gain adjustment between spectrum subtraction (SS) processing and flooring processing, as processing for two channel input speech which is obtained from the microphones 1 and 2 etc. As the gain adjustment, a CSP (Cross-power Spectrum Phase) coefficient, which is cross-correlation between two channel signals, can be utilized. In an indoor environment including a vehicle where audio background sound etc., a recognition rate of a voice command in a car navigation system is improved, then, usability of a speaker such as a driver is improved. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To solve the problem wherein although the performance of a voice recognition device deteriorates significantly in the circumstances in which there exists long reverberation, which is generally known, and most of the conventional reverberation removal methods require a large amount of calculation is not large, or for those where the amount of calculation is not large, some kind of previous knowledge (reverberation time of a room, etc.) is required. SOLUTION: The coefficient determination in the conventional techniques, in which the multiple value of the coefficient of power spectrum of the past frame is subtracted from the power spectrum of the current frame is calculated at low cost, without having to use the information that incurs calculation cost, such as acoustic model or multi-channel input. As a specific method, a voice power track that properly follows the frame of large power and follows the frame of small power late is obtained, and the interval of which the voice power differs significantly from the voice power of the current frame that is smoothed in the time direction is deduced as being an utterance terminal reverberation interval, and the filter coefficient is decided, in such a manner as to minimize the weighted total sum of the residual voice power in the interval and the subtracted power in the utterance interval (not including the reverberation interval). COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To accurately collect speech of only a specified speaker such as a sales person in counter selling or the like. SOLUTION: A speech collection system 10 extracts and collects target speech which is a target in a plurality of pieces of speech in which coming directions are different from each other. The system includes a microphone array 11 including at least first and second microphones 11a and 11b, in which the first and second microphones are arranged by separating them with a predetermined distance. Discrete Fourier transform is performed on each signal of speech received by the first and second microphones, and a plurality of cross spectrum power (CSP) coefficients related to the coming direction of speech are calculated, and a plurality of speech signals are detected from the plurality of CSP coefficients. Then, a speech direction index defined according to an angle between a line for connecting the first and second microphones and the coming direction, is detected from the plurality of calculated CSP coefficients, and the signal of the target speech is extracted from the plurality of speech signals, which are detected from the detected speech direction index. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide an apparatus for synchronizing a content data stream with meta data by the featured value of the content data stream. SOLUTION: The apparatus for synchronizing content data with the meta data includes: a storage device wherein the meta data including the featured value of the content data are recorded; a means for calculating the featured value from the content data; a means for retrieving the meta data in the storage device on the basis of the calculated featured value; and a means for reproducing the retrieved meta data in synchronism with the content data. COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a highly accurate voice activity detection method in a low S/N environment. SOLUTION: The voice activity is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech included in the speech signal by using the long-term spectrum variation component feature, or a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To increase convenience of a user visiting an island in a virtual space. SOLUTION: In the virtual space comprising a plurality of islands, positions of the islands are two-dimensionally mapped preferably by multidimensional scaling such as Kruskal's method such that the relation of distance between a characteristic vector including information on a profile, taste or the like of the user and a characteristic vector including information on a profile, an event or the like of each island is maintained. Thus, a map server provides the user with the islands arranged in accordance with the characteristic vector of the user based on mapped information. Thereby, it is convenient for the user to visit the island suited to his or her taste, so that a use frequency of the virtual space is improved. COPYRIGHT: (C)2009,JPO&INPIT