Abstract:
A supporting system for efficiently creating a synonym candidate when a thesaurus used in text mining is compiled and a method for creating a synonym candidate are disclosed. A synonym candidate acquiring device (130) creates, for each author, an author synonym candidate set containing synonym candidates similar to an inputted word from data (110) on the author and creates a whole synonym candidate set containing synonym candidates similar to the inputted word from the whole data (120). A synonym candidate judging device (150) evaluates the synonym candidates of the whole data (120) on receiving the created synonym candidate set (140). During the evaluation, a status, “absolute”, is added to a word agreeing with the word rating as the first place in the synonym candidates for each author; and a status, “negative”, is added to a word agreeing with the word rating as the second or later place.
Abstract:
PROBLEM TO BE SOLVED: To extract a meaningful text block from a document that is optionally subjected to a layout, such as a table, itemization and a multicolumn composition. SOLUTION: A document subjected to a layout with blanks, etc., is inputted and a symbol associated with the spatial coordinates of the document is acquired. The continuation of the same type of characters is extracted from the symbol and tokens and spaces are generated. A stream is generated from spaces continuing in the column direction, and a text block is generated from the streams and the tokens. A link between text blocks is generated and defined as a document graph. The propriety of the connection (link) between the text blocks in the document graph is evaluated by using a language model, and when the connection is proper, the text blocks are merged.
Abstract:
PROBLEM TO BE SOLVED: To provide a means for performing text mining of document data written in a language other than mother language or familiar language, and for satisfying a request for retrieval. SOLUTION: This computer system outputting a term in a second language paired with a term in a first language to be translated includes: a first extraction part for extracting a co-occurrence term which co-occurs with the term of the first language from a corpus of the first language; an output part for outputting a word translated in the second language corresponding to at least one of the extracted co-occurrence terms; a second extraction part for extracting translation candidates which co-occur with at least one of the output words in the second language from a corpus of the second language corresponding to the corpus of the first language; a weighting part for weighting each of the extracted translation candidates; and a generation part for optimizing the weight, and for generating the list of the translation pairs for the term in the first language according to the optimized weight. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To suitably detect liking expression indicating people's liking for a commodity or the like. SOLUTION: An expression detection system for detecting liking expression indicating evaluator's liking to a specific evaluation target from texts in which the evaluation of the specific evaluation target is described stores a plurality of texts in which the evaluation of the specific evaluation target is described corresponding to the attributes of respective texts, extracts the evaluation expression indicting the evaluation of the specific evaluation target from respective texts, judges whether the extracted evaluation expression is positive polarity indicating positive evaluation to the specific evaluation target or negative polarity indicating negative evaluation to the specific evaluation target, inputs the attribute of a text specified as an object for detecting liking expression, detects evaluation expression detected from the text having the inputted attribute as liking expression, and then outputs the liking expression corresponding to the frequency of judgement that the liking expression in the text having the attribute is positive polarity or negative polarity. COPYRIGHT: (C)2006,JPO&NCIPI
Abstract:
PURPOSE: To enable syntax analysis with a certain degree of accuracy even for any sentence by reanalyzing a sentence, which can not be analyzed, by utilizing the word train of a sentence, for which the syntax analysis is enabled, in the same context in syntax analystic processing depending on grammatical knowledge. CONSTITUTION: A morpheme analysis block 104 divides a sentence applied from an input block 102 and analyzes the part of speech or inflection of each word while referring to a dictionary and a syntax analysis block 106 performs processing for providing information to consist of tree structure based on the output information from the mo pheme analysis block 104. Further, a context analysis block 108 holds the context information of plural entier inputted snetences and performs processing to provide the exact analyzed result as much as possible by applying context information to the sentence showing the unsuitable analyzed result outputted from the syntax analysis block 106, namely, to the sentence (grammatically unsuitable sentence) not to be analyzed by the conventional syntax analysis depending on the grammatical knowledge. Thus, the accuracy of the natural language processing system is improved and the syntax analysis is enabled with a certain degree of accuracy.
Abstract:
PROBLEM TO BE SOLVED: To provide information which is helpful for predicting technology tend by analyzing the subject description part and effect description part of a technical document. SOLUTION: This technology tend prediction support device is provided with: a description part extraction part for extracting a subject description part and an effect description part from a technical document; a technical expression extraction part for extracting technical expression expressing matters to be achieved by the technology from the subject description part and the effect description part; an influence degree decision part for deciding the degree of influence to be given to business by the matters expressed by the extracted technical expressions; a naming part for naming the extracted technical expressions; and a technology map creation part for creating a technology map. The created technology map has axes related with time to be spent on the achievement of the technology and the degree of influence to be given to business, and the names of the extracted technical expressions are displayed on pertinent coordinates. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide an expression extraction device which extracts evaluation expressions showing evaluations of an evaluation object, from a text to properly determine a polarity. SOLUTION: The expression extraction device which extracts evaluation expressions from the text having descriptions on evaluations of a specific evaluation object is provided with; a registered expression storage part for registering an evaluation expression having a polarity predetermined, as a registered expression; an expression extraction part for extracting a plurality of evaluation expressions and a conjunction expression from the text; a registered expression detection part for detecting an evaluation expression including the registered expression registered in the registered expression storage part, out of the plurality of evaluation expressions; and a polarity determination part for determining that an evaluation expression which is in conjunction with the evaluation expression including the registered expression by a conjunction expression in a form of copulative conjunction and a series of evaluation expressions which are not in conjunction with the evaluation expression by a conjunction expression in any form of adversative/copulative conjunction and are not in conjunction with each other by a conjunction expression in any form of the adversative/copulative conjunction are of the same polarity as the registered expression. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To automatically extract a document satisfying a pattern from enormous amount of documents, to extract useful knowledge and to reduce time required for a response by generating a field-dependent dictionary from document data, generating a syntax tree considering modification, by means of a language analysis device and extracting/outputting a frequentlyappearing pattern by means of a pattern extraction device. SOLUTION: A language feature analysis device generates an analysis- dependent dictionary. A language analysis device needs to prepare a field- dependent dictionary for requiring an attribute adjusted to data to be analyzed. A word having the specified attribute is to be generated by each field. The language feature analysis device checks the word from actual data and registers it in the field-dependent dictionary. A pattern extraction device obtains a pattern, which frequently appears by using document data which is structure- analyzed by the device and takes out an original document having a syntax which is matched with the pattern. A frequently-appearing pattern device displays the document, having the detected frequently-appearing pattern and a syntax tree matched with it.
Abstract:
PROBLEM TO BE SOLVED: To improve the selection accuracy of translated words in a machine translation mode without lowering the processing efficiency by using plural types of dictionaries including a context dictionary when a word that is not defined in a compound word dictionary is translated in a sentence. SOLUTION: Every input sentence is taken out at its head (110), and the compound words corresponding to a word string composing a single input sentence are retrieved from a compound word dictionary. When each of wards which are not corresponding to the compound words is translated (120), its translation is decided by a context dictionary and the translated word is obtained based on the translation (130). The translated word undergoes the translation result registering processing. Then it's checked whether an object word is stored in a translation result recording buffer as a header. If the object word is stored in the buffer, it's checked whether or not its translated word is stored in the buffer and all words which are not corresponding to the compound words are processed (140, 160). Then all words are translated (170). When it's decided that a full sentence is translated, the sentence is registered (180 to 195) after undergoing the retranslation effect evaluation processing.
Abstract:
PROBLEM TO BE SOLVED: To more accurately detect negative opinions in social media.SOLUTION: A method for processing, with a computer, a plurality of messages sent by each of a plurality of users over time includes: obtaining a plurality of messages each including a specific proper noun; determining a politeness level of each of the messages; and calculating a proportion of messages having a politeness level lower than a predetermined threshold, for the messages including a specific word.