-
公开(公告)号:JP2003263427A
公开(公告)日:2003-09-19
申请号:JP2002062625
申请日:2002-03-07
Applicant: ATR ADVANCED TELECOMM RES INST
Inventor: YAMAMOTO HIROSHI , KIKUI GENICHIRO
Abstract: PROBLEM TO BE SOLVED: To provide a method of dividing a sentence by learning without a teacher, without using heuristics. SOLUTION: This method of generating a word division model using a training sentence without word division comprises a first step of generating a network of dividable candidate words from all the given training sentences using a given dictionary entry, a second step of generating such a model as to minimize entropy, to the network of the candidate words generated in the first step, and a third step of smoothing the transition probability value which is the probability value for predicting the following word from a known word or a known word pair. COPYRIGHT: (C)2003,JPO
-
公开(公告)号:JP2003263430A
公开(公告)日:2003-09-19
申请号:JP2002064099
申请日:2002-03-08
Applicant: ATR ADVANCED TELECOMM RES INST
Inventor: SUGAYA FUMIAKI , KANESHIRO YUMIKO , TAKEZAWA TOSHIYUKI , KIKUI GENICHIRO , YAMAMOTO SEIICHI
IPC: G06F17/28
Abstract: PROBLEM TO BE SOLVED: To efficiently collect large-scale data by collecting words and phrases in the state of being divided in cells, and adding synonyms and the like to the cells. SOLUTION: When an original sentence inscribed in English is presented (S1), a sentence translated in Japanese (a Japanese sentence) is inputted (S3). When a first prescribed number (at least two, for instance) of sentences are inputted (S5), the sentences are divided for every character string such as words or phrases. At this time, the same character strings are put together in the same cell, and the different character strings are inputted (distributed) in different cells (S7). Partial information such as synonyms and related words of the character string is added to every cell (S11). The words or phrases, i.e., language data, are thus collected being divided in the cells. COPYRIGHT: (C)2003,JPO
-