Abstract:
A system that outputs phonemes and accents of texts. The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text.
Abstract:
The present invention relates to a speech recognition system comprising means (4) for performing a frequency analysis of an input speech in a succession of time periods to obtain feature vectors, means (8) for producing a corresponding label train using a vector quantization code book (9), means (11) for matching a plurality of word baseforms, expressed by a train of Markov models each corresponding to labels, with said label train, means (14) for recognizing the input speech on the basis of the matching result, and means (5, 6, 7, 9) for performing an adaptation operation on the said system to improve its ability to recognise speech. According to the invention, the speech recognition system is characterised in that means for performing an adaptation operation comprises means (4) for dividing each of a plurality of input speech words into N segments (N is an integer number more than 1) and producing a representative value of the feature vector of each segment of each input speech word a means for dividing into segments word baseforms each corresponding to one of said input speech words and for producing a representative value of each segment feature vector of each word baseform on the basis of a prototype vector of the vector quantization code book, means for producing a distance vector indicating the distance between a representative value of each segment of each input speech word and a representative value of the corresponding segment of the corresponding word baseform, means for storing the degree of relation between each segment of each input speech word and each label in a label group of the vector quantization code book; and prototype adaptation means for correcting a prototype vector of each label in the label group of the vector quantization code book by each displacement vector in accordance with the degree of relation between the label and the displacement vector.
Abstract:
The present invention relates to a speech recognition system of the type comprising a plurality of probabilistic finite state models each having a number of states and the ability of undergoing transitions from one state to another and producing a corresponding output representing a speech element, and means for defining for each model probability parameters each representing the probability that the model will undergo a transition from one predetermined state to another predetermined state and produce a corresponding output. Such a system can be used to recognise input speech data by initially dividing the input speech data into individual speech elements (4, 5, 6) and then applying the input speech elements to the models, and utilising the probability parameters of the models to recognise the input speech elements. According to the invention the speech recognition system is characterised in that it comprises training means (8) for supplying training speech data to the models in order to train the models and to define initial values for the probability parameters for each of the models, and adaptation means (9) for supplying adaptation speech data to the models in order to adapt the models and to define adapted values of the probability parameters for each of the models. The adapted values of the probability parameters are used to recognise the input speech elements (10).
Abstract:
PROBLEM TO BE SOLVED: To provide a technique for extracting features even more robust to reverberations, noises, and the like from a speech signal.SOLUTION: A speech feature extraction apparatus is configured to: receive, as an input, values obtained by adding a spectrum of each frame of a speech signal segmented into frames to an average spectrum that is the average of spectra over all frames that are overall speech; and, for each frame, multiply said values by weights of a mel filter bank to sum up the products, apply the discrete cosine transform to the logarithm of the sum, and calculate, and define as a delta feature, the difference in the discrete cosine transform between former and later frames.
Abstract:
PROBLEM TO BE SOLVED: To provide a technique for extracting idle talk parts from a conversation.SOLUTION: An idle talk extraction system for extracting idle talks from a conversation comprises: a first corpus including documents in a plurality of fields; a second corpus including only documents in a field to which the conversation belongs; a determination part to determine as a lower limit subject word a word for which an idf value for the first corpus and an idf value for the second corpus are each below a first prescribed threshold value, for words included in the second corpus; a score calculation part to calculate as a score a tf-idf value for each word included in the second corpus and, for the lower limit subject word, use a constant set as a lower limit instead of the tf-idf value; a clipping part to sequentially cut out intervals to be processed, from text data of contents of the conversation; and an extraction part to extract as an idle talk part an interval where an average value of the score of words included in the interval is larger than a second prescribed threshold value.
Abstract:
PROBLEM TO BE SOLVED: To provide a technology capable of detecting an ingressive in a voice signal with a high detection rate and a high degree of accuracy.SOLUTION: An ingressive detection device refers to each acoustic model of ingressive and non-ingressive for determining an ingressive candidate and generates a feature vector with setting simplex information meaning information on ingressive candidate simplex, and context information as an element. The context information means information on a relation between the ingressive candidate and a speech section including the ingressive candidate, a relation between the ingressive candidate and an ingressive candidate before and after the ingressive candidate or both relations. The ingressive detection device obtains classification reference information for classifying the ingressive candidate into either the ingressive or the non-ingressive, through machine learning with setting the feature vector as input, and classifies the ingressive candidate into either the ingressive or the non-ingressive based on the classification reference information.
Abstract:
PROBLEM TO BE SOLVED: To appropriately process voice data of dialogue between two persons. SOLUTION: A system for processing voice data of dialogue between two persons comprises: a first transition calculating section for calculating transition of an utterance rate of a first person from the voice data of dialogue between two persons; a second transition calculating section for calculating transition of an utterance rate of a second person from the voice data of dialogue between two persons; a difference calculating section for calculating a difference data sequence for expressing transition of difference between the utterance rate of the first speaker and the utterance rate of the second speaker; a smoothing section for creating a smooth difference data sequence in which the difference data sequence is smoothed; and a presentation section for presenting transition of the utterance rates of the first speaker and the second speaker, which is expressed by using the smooth difference data sequence. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To extract only voice of a target person under noise environment, without requiring a large scale microphone array and a reference signal of noise. SOLUTION: An object sound extraction method is disclosed in which a practical speech recognition performance is actualized only by performing gain adjustment between spectrum subtraction (SS) processing and flooring processing, as processing for two channel input speech which is obtained from the microphones 1 and 2 etc. As the gain adjustment, a CSP (Cross-power Spectrum Phase) coefficient, which is cross-correlation between two channel signals, can be utilized. In an indoor environment including a vehicle where audio background sound etc., a recognition rate of a voice command in a car navigation system is improved, then, usability of a speaker such as a driver is improved. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To acquire a characteristic to be recognized as a phrase and its pronunciation more accurately than before. SOLUTION: A system selects a plurality of candidate character strings as candidates to be recognized as phrases from an input text, combines predetermined pronunciations with respective characters included in each of the selected candidate character strings to generate a plurality of candidates for pronunciations of the candidate character string, combines data wherein the respective generated candidates for the pronunciations are made to correspond to respective candidate character strings with language model data wherein numerals indicative of frequencies of appearance of the respective phrases in the text are recorded to generate frequency data indicative of frequencies of appearance by pairs of character strings representing the phrases and pronunciations, speech-recognizes an input speech based upon the generated frequency data to generate recognition data wherein character strings indicative of a plurality of phrases included in the input speech are made to correspond to pronunciations, and selects and outputs a combination included in the recognition data among combinations of candidate character strings and candidates for pronunciations. COPYRIGHT: (C)2008,JPO&INPIT