Abstract:
PROBLEM TO BE SOLVED: To insert punctuation marks on suitable positions in a sentence. SOLUTION: An acoustic processing part 20 processes inputted voice data and converts the data into characteristic vectors. When punctuation mark automatic insertion is not executed, a language mark-reproduction part 22 processes the characteristic vectors by using only a versatile language model 320, and inserts a punctuation mark on a part where insertion of a punctuation mark is shown clearly, for example, 'a comma' or the like, by voice data. When the punctuation mark automatic insertion is executed, the language mark- reproduction part 22 discriminates a pause part having no voice as a comma ',' or the like by using the versatile language model 320 and a punctuation language model 322.
Abstract:
PROBLEM TO BE SOLVED: To provide a practical system etc. for voice recognition, in which recognition performance is improved by considering utterance variation. SOLUTION: The system includes a voice recognition device 200 and a pre-processor 100 for creating a recognition graph used for voice recognition processing by the voice recognition device 200. The pre-processor 100 comprises: a language model estimation section 110 for estimating a language model; a recognition word dictionary section 130 holding corresponding information to a word, a phoneme string just in the same description as in the word, and to information on the phoneme string in which utterance variation is described; and a recognition graph creating section 140 for creating a recognition graph on the basis of a language model estimated by a language model estimation section 110, and the correspondence information held by the recognition word dictionary section 130 regarding the word included in the language model. The recognition graph creating section 140 creates the recognition graph by applying the phoneme string considering utterance variation regarding the word with respect to the word included in a word string composed of more than a fixed number of words. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To search a new phrase to be registered in a dictionary of a dividing means which breakes down a text into phrases. SOLUTION: This system inputs a text for learning into a dividing means to break down into phrases to produce break down candidates including the phrases different in combination according to the obtained break down reliability. It sums up the reliability of the break down candidates including those phrases for each phrase to find out their likelihood. Then, it finds out the combination minimizing the information entropy of the phrase considered to appear at the frequency matching the likelihood of the phrases in the combination within the extent that the text can be expressed by using the phrases included in a combination among the combinations of phrases included at least in one candidate, and to outputs it as a combination of phrases including the new phrase. COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a data processing method suitable for transcribing speeches obtained in a special situation such as a trial and a meeting into a text by establishing proper correspondence between a text having been corrected and an original speech even if the text written down through speech recognition is corrected, and a system using the same. SOLUTION: The system is equipped with: a speech recognition processing part 32 which specifies utterance sections in speech data, performing speech recognition of respective utterance sections, and correlates the obtained character strings of recognition data of each utterance section and the speech data according to information on utterance time; and an output control part 34 which displays a text created by sorting recognition data for each utterance section. The system is further equipped with: a text editing part 35 which edits the created text; and a speech correspondence estimation part 36 which correlates character strings in the edited text to the speech data by using a dynamic programming technique. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To simultaneously estimate a word and a syntactic structure with a high precision by providing a probability model allowing selection of a range of a history used for estimation and using this probability model as a structural language model with respect to processing for estimating the next data element on the basis of the history having a tree structure. SOLUTION: With respect to a word estimating method for voice recognition using a computer, the tree structure of the history of words preceding a word as the estimation object is specified, and a context tree which is stored in a tree-like context tree storage part 40 and has information related to structures allowed for a sentence and appearance probabilities of words for these structures as nodes is referred to, and a word is estimated on the basis of the context tree and the specified sentence structure of the history.
Abstract:
PROBLEM TO BE SOLVED: To provide a device and a method for voice recognition which have a higher recognition rate than conventionally. SOLUTION: Words are divided into redundant words and other normal words and any of a predicted word and a precedent word as a condition are predicted discriminatingly between those two to improve the precision of the word prediction at a redundant word peripheral part. To this end, the voice recognition device has an acoustic processing means which converts an analog voice input signal into a digital signal, a storage means which stores acoustic models having learnt features of sounds, a storage means having a dictionary which has both 1st language models learnt on the basis of a document containing redundant words and normal words other than the redundant words in advance and 2nd language models learnt on the basis of a document of only normal words, while ignoring redundant words, and a means which recognizes as an inputted voice the word having the highest probability by calculating probability, by using the acoustic models and dictionary for the digital signal.
Abstract:
PURPOSE: To balance the limit of the number which can be presented and a request to perform a prediction as far as possible by effectively performing a switch as to whether performing only a one-character prediction or performing a word prediction. CONSTITUTION: A reading candidate character string is recognized from the reading information inputted from a coordinate input/display device 11 via a character input part 7. For every recognized reading candidate character string, the character which can be continuous to the reading candidate character string and the incidence probability (branch probability) are acquired by retrieving a dictionary. The probability L of a predicted character string is determined. At this time, whether the number of the predicted character string is more than the maximum number N to be presented to a user as candidate character strings or not is judged. The words are sorted in order of larger L. The difference of the sum total Lc of the L of the words up to an N number and the sum total Ld of the L of the words up to an N+1st number or after is a prescribed number or more, the word predicted by performing an extension is presented to the user.
Abstract:
PURPOSE:To enable a post-processing based on a transition probability in a language provided with a lot of character sets such as Japanese by adding attributes such as the parts of speech or the like to candidate characters obtained as the result of character recognition and evaluating the transition probability. CONSTITUTION:In a post-processing device for selecting the optimum combination of the candidate characters from the view point of a character transition probability from the strings (character lattices) of candidate character groups obtained as the result of recognizing Japanese character strings, a character/part of speech correspondence storage means 10 stores the parts of speech possibly adopted by the respective characters in the character strings for the respective characters and a part of speech corresponding means 11 makes the parts of speech correspond to the respective candidate characters based on stored contents. Also a character transition probability storage means 12 stores the transition probabilities that the respective characters corresponding to the parts of speech are connected to each other and a connection relation evaluation means 13 evaluates the connection relation of the candidate characters with the candidate characters in front in the character lattices for the respective candidate characters corresponding to the parts of speech based on the stored contents of the transition probability storage means 12. Then, an optimum pass selecting means 15 selects the candidate character whose connection relation is optimum.
Abstract:
PURPOSE: To execute the post processing of a Japanese sentence inputted from an OCR at sufficiently high accuracy and speed. CONSTITUTION: After searching grammatically formed passes based upon a recognition result and the restriction of Japanese, the cost of each available pass is calculated and a plurality of candidate passes having suitable cost values are selected. Then the conviction Cf of each character candidate (or plural candidates) of each column is calculated from the cost g (1) of a candidate pass passing the character candidate itself (or the candidates themselves) and the cost g (2) of a candidate pass passing another character candidate (or other character candidates). Thus the substitution of candidates or warning to an operator is executed, based upon the calculated value.
Abstract:
PROBLEM TO BE SOLVED: To provide a technique for extracting idle talk parts from a conversation.SOLUTION: An idle talk extraction system for extracting idle talks from a conversation comprises: a first corpus including documents in a plurality of fields; a second corpus including only documents in a field to which the conversation belongs; a determination part to determine as a lower limit subject word a word for which an idf value for the first corpus and an idf value for the second corpus are each below a first prescribed threshold value, for words included in the second corpus; a score calculation part to calculate as a score a tf-idf value for each word included in the second corpus and, for the lower limit subject word, use a constant set as a lower limit instead of the tf-idf value; a clipping part to sequentially cut out intervals to be processed, from text data of contents of the conversation; and an extraction part to extract as an idle talk part an interval where an average value of the score of words included in the interval is larger than a second prescribed threshold value.