Information processor, information processing method, information processing system, and program
    11.
    发明专利
    Information processor, information processing method, information processing system, and program 有权
    信息处理器,信息处理方法,信息处理系统和程序

    公开(公告)号:JP2010044597A

    公开(公告)日:2010-02-25

    申请号:JP2008208297

    申请日:2008-08-13

    Abstract: PROBLEM TO BE SOLVED: To estimate similarity of phrases expressed by a plurality of types of characters. SOLUTION: An information processor 110 includes: a character decision part 118 which receives a character sequence showing a full name, and decides the type of characters configuring the received character sequence; a different notation acquisition part 120 which generates different notations in which the character sequence is described in the different type of characters including ideogram or phonogram from the character sequence, and which generates notation vectors including at least two different types of full name notations including the phonogram; a similarity calculation part 122 which executes different similarity determination in response to the type of characters, and which calculates scores giving the scale of similarity for the elements of the notation vectors; and a similarity score calculation part 124 which calculates similar scores for full name candidates by using the scores calculated by the similarity calculation part 122. COPYRIGHT: (C)2010,JPO&INPIT

    Abstract translation: 要解决的问题:估计由多种类型的字符表示的短语的相似性。 解决方案:信息处理器110包括:字符决定部分118,其接收表示全名的字符序列,并且确定构成接收到的字符序列的字符的类型; 不同的符号获取部分120,其产生不同符号,其中字符序列在包括表意文字或语音符号的不同类型的字符中被描述,并且生成包括至少两种不同类型的全名符号的符号向量,包括语音 ; 相似度计算部分122,其响应于字符的类型执行不同的相似性确定,并且计算给出符号向量的元素的相似度的分数; 以及相似度计算部124,其通过使用由相似度计算部122计算的分数来计算全名候选的相似分数。(C)2010年,JPO和INPIT

    Information processor for converting character string, character string converting method, program and information processing system
    12.
    发明专利
    Information processor for converting character string, character string converting method, program and information processing system 有权
    用于转换字符的信息处理器,字符转换方法,程序和信息处理系统

    公开(公告)号:JP2010009329A

    公开(公告)日:2010-01-14

    申请号:JP2008168087

    申请日:2008-06-27

    Inventor: FUKUDA TSUYOSHI

    Abstract: PROBLEM TO BE SOLVED: To provide an information processor for converting a character string, a character string converting method, a program and an information processing system.
    SOLUTION: The information processor 100 includes an original character obtaining part 116 for obtaining an original character string; a phoneme segmentation part 120 for segmenting the obtained original character string into original character phonemes by referring to phoneme data of the original character string; a conversion likelihood calculating part 122 for obtaining, for the sequence of the original character phonemes formed by the phoneme segmentation part 120, a conversion probability of conversion destination phonemes of a language to be converted which correspond to the original character phonemes and a transition probability of the conversion object phonemes corresponding to the continuous original character phonemes by referring to a probability model formed from learning to calculate a conversion likelihood, and a most likely phoneme sequence determining part 118 for determining and outputting the most likely phoneme sequence by referring to the conversion likelihood calculated by the conversion likelihood calculating part 122.
    COPYRIGHT: (C)2010,JPO&INPIT

    Abstract translation: 要解决的问题:提供一种用于转换字符串,字符串转换方法,程序和信息处理系统的信息处理器。 解决方案:信息处理器100包括用于获得原始字符串的原始字符获取部分116; 音素分割部分120,用于通过参考原始字符串的音素数据将所获得的原始字符串分割成原始字符音素; 转换似然度计算部分122,用于为由音素分割部分120形成的原始字符音素序列获得与原始字符音素对应的要转换的语言的转换目的地音素的转换概率,以及转换概率 通过参考由学习形成的概率模型来计算转换似然度的对应于连续原始字符音素的转换对象音素,以及最可能的音素序列确定部分118,用于通过参考转换似然性来确定和输出最可能的音素序列 由转换似然率计算部分122计算。版权所有:(C)2010,JPO&INPIT

    Information processor, full name identifying method, information processing system, and program
    13.
    发明专利
    Information processor, full name identifying method, information processing system, and program 有权
    信息处理者,全名识别方法,信息处理系统和程序

    公开(公告)号:JP2009266110A

    公开(公告)日:2009-11-12

    申请号:JP2008117538

    申请日:2008-04-28

    Inventor: FUKUDA TSUYOSHI

    Abstract: PROBLEM TO BE SOLVED: To provide an information processor, a full name identifying method, an information processing system, and a program.
    SOLUTION: The information processor 100 includes: a kanji normalization part 114 for normalizing a multi-byte character string to a character style to register; a morpheme analyzing part 116 for dividing the normalized character string into morphological tokens and acquiring attribute identifiers allocated to the morphological tokens; a full name candidate preparing part 118 for generating a connection identifier from the morphological tokens, the attribute identifiers, and an attribute identifier between the morphological tokens, generating a cultural area weighting value for giving weighting about a cultural area and registering the cultural area weighting value as a full name candidate list; a score calculating part 120 for acquiring the morphological tokens, the connection identifier, and a score value allocated about the cultural area weighting value, calculating the total score value, and making a full name distance a full name candidate by using the full name distance giving a scale of distance from the head of a full name to the end; and a notation converting part 122 for outputting a single-byte character string corresponding to a family name and a given name of a morphological token included in the full name candidate.
    COPYRIGHT: (C)2010,JPO&INPIT

    Abstract translation: 要解决的问题:提供信息处理器,全名识别方法,信息处理系统和程序。 解决方案:信息处理器100包括:用于将多字节字符串归一化为要注册的字符样式的汉字归一化部分114; 语素分析部分116,用于将归一化字符串分割为形态符号,并获取分配给形态令牌的属性标识符; 全名候选人准备部分118,用于从形态令牌生成连接标识符,属性标识符和形态标记之间的属性标识符,生成用于给出关于文化区域的加权的文化区域加权值并注册文化区域加权值 作为全名候选人名单; 分数计算部分120,用于获取形态标记,连接标识符和关于文化区域权重值分配的得分值,计算总得分值,并通过使用全名距离给出全名候选人 距离全名的距离到最后的距离; 以及符号转换部分122,用于输出与全名候选人中包括的姓氏和形态令牌的给定名称相对应的单字节字符串。 版权所有(C)2010,JPO&INPIT

    METHOD AND DEVICE FOR EXTRACTING KNOWLEDGE FROM ENORMOUS DOCUMENT DATA AND MEDIUM

    公开(公告)号:JP2001084250A

    公开(公告)日:2001-03-30

    申请号:JP23967499

    申请日:1999-08-26

    Applicant: IBM

    Abstract: PROBLEM TO BE SOLVED: To automatically extract a document satisfying a pattern from enormous amount of documents, to extract useful knowledge and to reduce time required for a response by generating a field-dependent dictionary from document data, generating a syntax tree considering modification, by means of a language analysis device and extracting/outputting a frequentlyappearing pattern by means of a pattern extraction device. SOLUTION: A language feature analysis device generates an analysis- dependent dictionary. A language analysis device needs to prepare a field- dependent dictionary for requiring an attribute adjusted to data to be analyzed. A word having the specified attribute is to be generated by each field. The language feature analysis device checks the word from actual data and registers it in the field-dependent dictionary. A pattern extraction device obtains a pattern, which frequently appears by using document data which is structure- analyzed by the device and takes out an original document having a syntax which is matched with the pattern. A frequently-appearing pattern device displays the document, having the detected frequently-appearing pattern and a syntax tree matched with it.

    METHOD AND APPARATUS FOR DERIVATION OF OPTIMIZATION COUPLINGRULE

    公开(公告)号:JPH09134365A

    公开(公告)日:1997-05-20

    申请号:JP28483695

    申请日:1995-11-01

    Applicant: IBM

    Abstract: PROBLEM TO BE SOLVED: To detect at high speed the correlation between the data having both numerical and specific (0-1) attributes by dividing a numerical attribute axis into plural sections, counting the number of data included in every divided section and also the number of data on the (0-1) attribute and then performing a specific processing. SOLUTION: A bucket processing part 1510 divides a numerical attribute axis corresponding to the numerical attribute into plural sections and counts the number of data and the number of data having the (0-1) attributes equal to 1 included in every divided section. A plane constitution processing part 1520 virtually constitutes a plane by means of a 1st axis corresponding to the total number of data on every section and a 2nd axis corresponding to the total number of data having the (0-1) attribute equaling to 1 on every section. Then the part 1520 virtually plots the points corresponding to the values to the sections on the plane. Furthermore, a largest tilt line extraction part 1530 extracts a pair of points having their connection line of the largest tilt among those pairs of points having intervals larger than T.N (T: rate, N: total data number) set toward the 1st axis and then outputs the corresponding section between the extracted pair of points.

Patent Agency Ranking