Abstract:
PROBLEM TO BE SOLVED: To provide speech enhancement technology which is effective even against sudden noise having no noise section and unknown sudden noise. SOLUTION: A signal enhancement device equipped with spectrum subtracting means 13a, 13b, 15 of subtracting a specified reference signal from an input signal containing a target signal and a noise signal, an adaptive filter 14 which is applied to a reference signal, and a coefficient control means of controlling a filter coefficient of the adaptive filter so as to reduce components of the noise signals of the input signal is provided with a database 16 for a signal model representing a specified quantity of the target signal with a specified statistical model, and controls the filter coefficient according to the likelihood of the signal model to the output signal of the spectrum subtracting means. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To insert punctuation marks on suitable positions in a sentence. SOLUTION: An acoustic processing part 20 processes inputted voice data and converts the data into characteristic vectors. When punctuation mark automatic insertion is not executed, a language mark-reproduction part 22 processes the characteristic vectors by using only a versatile language model 320, and inserts a punctuation mark on a part where insertion of a punctuation mark is shown clearly, for example, 'a comma' or the like, by voice data. When the punctuation mark automatic insertion is executed, the language mark- reproduction part 22 discriminates a pause part having no voice as a comma ',' or the like by using the versatile language model 320 and a punctuation language model 322.
Abstract:
PROBLEM TO BE SOLVED: To enable voice recognition in word unit of Japanese. SOLUTION: A user divides a prescribed sentence to words, and corresponding relations between the words by the user and respective form elements of the prescribed sentence are inspected, and the tendency of the word division of the user is judged by the corresponding relations, and a form element group of an example sentence data base is made in word matched with the word division tendency of the user, and a word group is formed, and an N gram 13 and an acoustic model 11 in word are formed using the formed word group, and a voice recognition device is constituted using them.
Abstract:
PURPOSE: To open a start end independently of the constitution of a Markov model and to properly suppress the increment of processing quantity by canceling an impossible matching path based on an intermediate likelihood value in each fine interval. CONSTITUTION: When a leading input label 1 in a tone section is obtained, a likelihood value E on a trellis is found out in the vertical direction based on viterbi algorithm within a range allowed by the inclination limit of the model. The operation is applied to a phenonic word model F for all vocabularies to be processed (step 27). In each progress of processing for one frame, vertical maximum likelihood value Emax(i) on the trellis corresponding to an input frame (i) is found out, only values included in a fixed range based on the maximum likelihood value are recorded (steps 28 to 31) to execute the processing of a succeeding input frame. Namely a succeeding frame continues only processing for a word and a state position left on the trellis of the preceding frame.
Abstract:
PURPOSE:To enable high-accuracy recognition by making the output probability of the identifier of a spectrum variation quantity prototype for recognizing a probability model common among probability models which having the same model identifier regarding spectrum data and a spectrum variation quantity. CONSTITUTION:Markov models in label units having independent label output probability are prepared for individual spectra and spectrum variation quantities. When the parameters of the Markov models are estimated, a switching device 14 is switched to the side of a parameter estimation device 16 for the models and a word base form table 18 where a sequence of label couples is registered is utilized to train the models, thereby determining the parameter value of the parameter table 20. When recognition is carried out, the switching device 14 is switched to the side of the recognition device 17 and an input speech is recognized based upon the sequence of label couples, the base form table 18, and parameter table 19. Consequently, high-accuracy recognition is carried out without increasing a calculation quantity nor storage quantity so much.
Abstract:
PROBLEM TO BE SOLVED: To provide an information processor, an information processing method, an information processing system and a program for analyzing a phrase reflecting information that is not recognized explicitly with words.SOLUTION: An information processor 120 uses voice data recording dialogs to identify information that is not clearly specified with words in the voice data, and comprises: an acoustic analysis unit 208 for execute acoustic analysis of the voice data by using acoustic data; a prosodic information acquisition unit 212 for identifying a region isolated before and after the voice data by a pause, identifying a phrase in the identified region by using the acoustic analysis of the identified region, and generating one or more prosodic feature values with respect to the phrase with setting a prosodic feature value of the phrase as an element; an appearance frequency acquisition unit 210 for acquiring an appearance frequency of the phrase, which is acquired by the acoustic analysis unit 208, in the voice data; and a prosodic variation analysis unit 214 for calculating a variation degree of the prosodic feature value of the phrase with high appearance frequency in the voice data, and determining a feature phrase.
Abstract:
PROBLEM TO BE SOLVED: To accurately collect speech of only a specified speaker such as a sales person in counter selling or the like. SOLUTION: A speech collection system 10 extracts and collects target speech which is a target in a plurality of pieces of speech in which coming directions are different from each other. The system includes a microphone array 11 including at least first and second microphones 11a and 11b, in which the first and second microphones are arranged by separating them with a predetermined distance. Discrete Fourier transform is performed on each signal of speech received by the first and second microphones, and a plurality of cross spectrum power (CSP) coefficients related to the coming direction of speech are calculated, and a plurality of speech signals are detected from the plurality of CSP coefficients. Then, a speech direction index defined according to an angle between a line for connecting the first and second microphones and the coming direction, is detected from the plurality of calculated CSP coefficients, and the signal of the target speech is extracted from the plurality of speech signals, which are detected from the detected speech direction index. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To efficiently and accurately recognize accent of input voice. SOLUTION: Notation data for learning showing notation of each phrase of a text for learning, utterance data for learning showing characteristics of utterance of each phrase, and boundary data for learning showing whether or not each phrase is the boundary of an accent phrase, are stored. The candidate of the boundary data is input, and first likelihood in which the boundary of the accent phrase of each phrase of the input text is coincident with the input candidate, is calculated from input notation data showing notation of the input text for showing the content of the input voice, the notation data for learning, and the boundary data for learning. Second likelihood in which utterance of each phrase of the input text becomes utterance indicated by input utterance data, when the input voice has the boundary of the accent phrase indicated by the candidate of the candidate data, from input utterance data showing characteristics of the utterance of each phrase of the input voice, the utterance data for learning, and the boundary data for learning. The candidate of the boundary data which maximizes a product of the first likelihood and the second likelihood, is searched and the result is output. COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a device for supporting design of a voice interface that receive a plurality of kinds of voice control, and a program and a method thereof. SOLUTION: The device comprises a database for recording speech samples associated with one of the plurality of kinds of voice control; a degree of similarity calculation part for calculating the similarity between a first assembly of the speech sample associated with the first voice control, and a second assembly of the speech sample associated with the second voice control; and a display part for displaying the similarity between the first assembly and the second assembly. The display part preferably displays a graph in which points corresponding to each of the plurality of kinds of the voice control are plotted as that the similarity is expressed. COPYRIGHT: (C)2008,JPO&INPIT