Abstract:
PROBLEM TO BE SOLVED: To estimate similarity of phrases expressed by a plurality of types of characters. SOLUTION: An information processor 110 includes: a character decision part 118 which receives a character sequence showing a full name, and decides the type of characters configuring the received character sequence; a different notation acquisition part 120 which generates different notations in which the character sequence is described in the different type of characters including ideogram or phonogram from the character sequence, and which generates notation vectors including at least two different types of full name notations including the phonogram; a similarity calculation part 122 which executes different similarity determination in response to the type of characters, and which calculates scores giving the scale of similarity for the elements of the notation vectors; and a similarity score calculation part 124 which calculates similar scores for full name candidates by using the scores calculated by the similarity calculation part 122. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide an information processor for converting a character string, a character string converting method, a program and an information processing system. SOLUTION: The information processor 100 includes an original character obtaining part 116 for obtaining an original character string; a phoneme segmentation part 120 for segmenting the obtained original character string into original character phonemes by referring to phoneme data of the original character string; a conversion likelihood calculating part 122 for obtaining, for the sequence of the original character phonemes formed by the phoneme segmentation part 120, a conversion probability of conversion destination phonemes of a language to be converted which correspond to the original character phonemes and a transition probability of the conversion object phonemes corresponding to the continuous original character phonemes by referring to a probability model formed from learning to calculate a conversion likelihood, and a most likely phoneme sequence determining part 118 for determining and outputting the most likely phoneme sequence by referring to the conversion likelihood calculated by the conversion likelihood calculating part 122. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide an information processor, a full name identifying method, an information processing system, and a program. SOLUTION: The information processor 100 includes: a kanji normalization part 114 for normalizing a multi-byte character string to a character style to register; a morpheme analyzing part 116 for dividing the normalized character string into morphological tokens and acquiring attribute identifiers allocated to the morphological tokens; a full name candidate preparing part 118 for generating a connection identifier from the morphological tokens, the attribute identifiers, and an attribute identifier between the morphological tokens, generating a cultural area weighting value for giving weighting about a cultural area and registering the cultural area weighting value as a full name candidate list; a score calculating part 120 for acquiring the morphological tokens, the connection identifier, and a score value allocated about the cultural area weighting value, calculating the total score value, and making a full name distance a full name candidate by using the full name distance giving a scale of distance from the head of a full name to the end; and a notation converting part 122 for outputting a single-byte character string corresponding to a family name and a given name of a morphological token included in the full name candidate. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To automatically extract a document satisfying a pattern from enormous amount of documents, to extract useful knowledge and to reduce time required for a response by generating a field-dependent dictionary from document data, generating a syntax tree considering modification, by means of a language analysis device and extracting/outputting a frequentlyappearing pattern by means of a pattern extraction device. SOLUTION: A language feature analysis device generates an analysis- dependent dictionary. A language analysis device needs to prepare a field- dependent dictionary for requiring an attribute adjusted to data to be analyzed. A word having the specified attribute is to be generated by each field. The language feature analysis device checks the word from actual data and registers it in the field-dependent dictionary. A pattern extraction device obtains a pattern, which frequently appears by using document data which is structure- analyzed by the device and takes out an original document having a syntax which is matched with the pattern. A frequently-appearing pattern device displays the document, having the detected frequently-appearing pattern and a syntax tree matched with it.
Abstract:
PROBLEM TO BE SOLVED: To detect at high speed the correlation between the data having both numerical and specific (0-1) attributes by dividing a numerical attribute axis into plural sections, counting the number of data included in every divided section and also the number of data on the (0-1) attribute and then performing a specific processing. SOLUTION: A bucket processing part 1510 divides a numerical attribute axis corresponding to the numerical attribute into plural sections and counts the number of data and the number of data having the (0-1) attributes equal to 1 included in every divided section. A plane constitution processing part 1520 virtually constitutes a plane by means of a 1st axis corresponding to the total number of data on every section and a 2nd axis corresponding to the total number of data having the (0-1) attribute equaling to 1 on every section. Then the part 1520 virtually plots the points corresponding to the values to the sections on the plane. Furthermore, a largest tilt line extraction part 1530 extracts a pair of points having their connection line of the largest tilt among those pairs of points having intervals larger than T.N (T: rate, N: total data number) set toward the 1st axis and then outputs the corresponding section between the extracted pair of points.