Abstract:
Speaker adaptation is provided which easily enables a person to use a Hidden Markov model type recognizer previously trained by other particular person or persons. During training process, parameters of Markov models are calculated iteratively for example using Forward-Backward algorithm. The adaptation comprises storing and utilizing the intermediate results or probabilistic frequences of the last iteration. During the adaptation process, parameters are calculated by interpolation of the weighted sum of the stored probabilistic frequences and the ones obtained using new training data.
Abstract:
PROBLEM TO BE SOLVED: To provide technology for extracting objective voice by efficiently suppressing mixing of other voice than objective voice, in a plurality pieces of voice which come from different directions. SOLUTION: The objective voice is extracted by performing at least either gain adjustment processing and segmentation processing of an utterance section, on a voice signal obtained by each of first and second voice input units which are arranged with a predetermined distance apart, by using a weighted Cross-Power Spectrum Phase (CSP) coefficient which becomes a small value in a frequency band which is likely to be influenced by other voice than the objective voice. COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To synthesize with high sound quality when there are many phonemes by utilizing advantages in waveform connection type speech synthesis, and synthesize with accurate accent even with less phonemes. SOLUTION: Prosody achieving both of accuracy and high sound quality can be provided by two-pass search of phoneme search and search of a prosody correction amount. In a preferable embodiment, in regards to both of the two passes of phoneme selection and correction amount search, consistency of the prosody is evaluated by using a statistical model of a change amount of the prosody (inclination of a basic frequency) to secure the accurate accent. A prosody correction amount system, in which correction prosody cost is minimum, is searched in search of the prosody corrected amount. Thereby, a correction amount system, which can increase likelihood to the statistical model of the change amount and an absolute value of the prosody with the correction amount as small as possible, is searched. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a voice recognition device for sufficiently recognizing original voice even when the reverberations from the ambient environment are superposed on the original voice, a voice recognition method, a computer executable program and a storage medium for performing the voice recognition method to computer. SOLUTION: The voice recognition device for recognizing the voice constituted to include the computer includes a means 20 for storing the featured values obtained from a voice signal every frame, means 24 and 26 for storing acoustic model data and language model data, a means 18 for forming reverberation voice model data from the voice signal acquired earlier than the voice signal to be processed at the point of that time and forming matching acoustic model data by using the reverberation voice model data and a means 16 for affording the result of the voice recognition of the voice signal by referencing the featured values, the matching acoustic model data and the language model data. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To realize a highly precise voice recognition by sufficiently coping with noises having sudden changes such as a noise that is suddenly generated and a noise that is irregularly generated. SOLUTION: Voice recognition is conducted by using a synthesized HMM obtained by synthesizing a voice HMM (a hidden Markov model) and a noise HMM, by making the featured values of inputted voice match with the synthesized HMM for every voice frame of the inputted voice. COPYRIGHT: (C)2004,JPO
Abstract:
PROBLEM TO BE SOLVED: To provide a model-setting function which is more suitable, relating to the removal of redundant word from the information obtained in user assignment or speaker registration. SOLUTION: A redundant word language model 322, specified as redundant words, is provided in addition to a versatile language model 320 as a language model 32a. The automatic deletion of the redundant words can be made, by using this redundant word language model 322.
Abstract:
PURPOSE: To provide a stable speech synthesis processing with the reduced tremble of pitch in a speech synthesizer system utilizing a pitch synchronized waveform superposition method. CONSTITUTION: A glottis closing point is made to be the reference point of superposition (pitch mark). Since the glottis closing point is stably and accurately extracted by using a dynamic wavelet transformation, a speech with few trembling and few gurgling is synthesized by its stability. By setting the reference point of superposition on another position from the reference point of waveform cutout, the softer cutout of a wave form is enabled. Extraction of the glottis closing point is performed by searching the local peak of dynamic wavelet transformation, however, preferably the threshold for searching the local peak of dynamic wavelet transformation is adaptively controlled every time when dynamic wavelet transformation is obtained.
Abstract:
PURPOSE:To provide the speech recognition device which efficiently represents various vocalization deformation with a statistical combination of a small number of kind of HMMs. CONSTITUTION:A feature extracting device 4 analyzes features of an input word to obtain a corresponding feature vector train or a label train by a labeling device 8. At every vocalization deformation candidate as a speech of a subword, a phonemic hidden Markov model is given N-gram relation (N: integer large than two) with a speech deformation candidate for a precedent subword in a word and held in a parameter table 18. The recognition device 16 applies the HMM for each speech deformation candidate on the basis of the N-gram relation corresponding to a description candidate word in a recognition object word vocalization dictionary 13, constitutes a speech model by connecting respective HMMs of respective speech deformation candidates in parallel between the subwords, and finds the probability that the constituted speech model outputs the label train or feature vector train of the speech-inputted word as to each candidate word, thereby outputting the candidate word corresponding to the speech model with the highest probability as a recognition result to a display device 19.
Abstract:
PURPOSE: To make it possible to execute speech recognition utilizing signal processing data in addition to speech signal processing by a signal processing card packaged on a bus of a personal computer. CONSTITUTION: The signal processing card 5 loaded to the bus 2 of the personal computer 1 has a bus master 6 and a main memory 4 in the computer 1 is accessed by the bus master 6. A vast probability value table necessary for speech recognition is stored in the main memory 4, and in each arrival of a label to be processed, a necessary part of the table is read out from a memory in the card 5 by the DMA transfer of the bus master 6 and speech recognition processing is executed.
Abstract:
Problem Eine Technologie zur Entnahme eines themenfremden Teils aus einem Gespräch bereitzustellen. Lösungsmittel Das System zur Entnahme eines themenfremden Teils zur Entnahme eines themenfremden Teils aus einem Gespräch beinhaltet: einen ersten Korpus, der Dokumente aus einer Vielzahl von Gebieten enthält; einen zweiten Korpus, der nur Dokumente aus einem Gebiet enthält, zu dem das Gespräch gehört; ein Ermittlungsmittel zur Ermittlung eines Wortes als ein Untergrenzengegenstandswort, für das der IDF-Wert für den ersten Korpus und der IDF-Wert für den zweiten Korpus jeweils unterhalb eines ersten bestimmten Schwellenwerts liegen; ein Anzahlberechnungsteil zur Berechnung eines TF-IDF-Wertes als Anzahl für jedes im vorgenannten zweiten Korpus enthaltene Wort, wobei der vorgenannte Anzahlberechnungsteil für das vorgenannte Untergrenzengegenstandswort eine konstante Einstellung einer Untergrenze anstelle eines TF-IDF-Wertes verwendet; ein Herausschneideteil zum sequenziellen Herausschneiden von der Verarbeitung unterzogenen Intervallen aus den Textdaten, die den Inhalt des vorgenannten Gesprächs darstellen; und ein Entnahmeteil zur Entnahme eines Intervalls, bei dem der Durchschnittswert der in dem vorgenannten herausgeschnittenen Intervall enthaltenen vorgenannten Anzahl von Wörtern größer ist als ein zweiter bestimmter Schwellenwert, als themenfremden Teil.