Abstract:
PROBLEM TO BE SOLVED: To provide a technology for retrieving documents matching a dependency pattern at a high speed from a large volume of text documents. SOLUTION: An index creation part creates an index for acquiring by sequential access the array of appearance information (document ID, position on tree) of a node even from each word appearing as the node of the tree of a syntax analytic result. A query input part receives a query from a user or an external application. The query is configured of a retrieval pattern, a pivot (node as the reference of retrieval pattern extension), the maximum depth difference in the case of retrieving an extended node form the pivot, the maximum number of extended nodes to be presented in the order of frequency and a flag designating whether to retrieve the high rank node. The index reading part obtains the appearance information array of the pivot at a place matched with the retrieval pattern. The retrieval is performed until it reaches any node connecting the route to the pivot. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To find a word to be registered from newly added text without omission, and to efficiently perform an operation when constructing a term dictionary of word categories. SOLUTION: A computer system includes a morphological analysis unit which acquires token sequence data by performing morphological analysis for text data, a category distinguishing unit which distinguishes respective tokens of the token sequence data by using a category dictionary to extract uncategorized words, an uncategorized-word comparing unit which compares each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word, and a token-sequence comparing unit which compares a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words, and comprises a permission unit which permits a user to select whether to register the registration candidate words in the category dictionary. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To quickly prepare the index of a large-scaled database regardless of the restriction of the capacity of a main storage. SOLUTION: A document set is analyzed into sub-sets which do not have any common sections. The set of keywords appearing in the divided sub-sets are grouped by a remainder calculated by dividing hash values of the keywords by a certain fixed integer value, and an index file corresponding to each group is created. The index files prepared for each of the sub-sets of the document having the same group numbers are merged. Thus, the integrated index files corresponding to individual group numbers are generated. The index files exist only by the number of pieces of the group numbers, and the indexes corresponding to the whole document set are not obtained. Then, the index files existing only by the number of pieces of such group numbers are further merged, and an index file corresponding to the whole document set is generated. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To retrieve document data while appropriately reflecting the contents of a retrieving sentence and appropriately detect the occurrence of problems from document data sequentially added. SOLUTION: A retrieval system retrieves document data including the contents of retrieving sentences from a plurality of document data. The retrieval system is provided with; a document database which stores a plurality of document data; a concept database which stores a plurality of concepts with hierarchical structure; a document data concept extraction part which extracts document concepts corresponding to document data based on keywords included in respective document data; a retrieving sentence concept extraction part which extracts retrieving sentence concepts based on keyword included in the retrieving sentences; a concept retrieval part which retrieves the document data, in which the retrieving sentence concepts become the concepts of upper hierarchy or lower hierarchy of document concepts, among a plurality of pieces of document data; and a retrieval result output part which outputs the document data retrieved by the concept retrieval part as the document data including contents designated by the retrieving sentences. COPYRIGHT: (C)2006,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To provide a technology for acquiring evaluation information of an object in a virtual-reality space. SOLUTION: An evaluation acquisition apparatus includes: a first conversation data acquiring section for acquiring neighboring conversation data composed of conversations which are made around an object of interest to be evaluated; a second conversation data acquiring section for acquiring wide-range conversation data composed of conversations made in an area in a virtual-reality space wider than a neighboring area of the object of interest; and an acquiring section for specifying expressions frequently appearing in the neighboring conversation data by use of the neighboring conversation data and the wide-range conversation data to acquire the expressions as the evaluation information of the object of interest. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a method and a system for evaluating a trend analysis system. SOLUTION: The device for evaluating a trend analysis system as a first embodiment to solve the problem comprises: an acceptable value input part for receiving acceptable values of false positives in which data which is irrelevant is judged as relevant, and acceptable values of false negatives in which data which is relevant is judged as irrelevant; and an accuracy rate calculation part for calculating the accuracy rate of the system, comprising a weight determination part for reading the accuracy data showing correctly whether there is a relation between the data of the existing data aggregate stored in the storage device from the storage device, and determining the weight to the number of the false positives of the system and the weight to the number of the false negatives from the acceptable values of the false positives and the acceptable values of the false negatives using the accuracy data, and a calculation part for calculating the accuracy rate of the system from the number and the weight of the false positives, the number and the weight of the false negatives, and the total number of the data. COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a device, a method and a program for visualizing a Boolean expression so as to easily clarify what is added and what is excluded as conditions. SOLUTION: In a Boolean expression to be visualized is input in the form of a binary tree in which a leaf node represents an operand in the Boolean expression, and nodes other than the leaf node represent operators in the Boolean expression. The input binary tree is converted into a two-dimensional nested expression composed of a plurality of areas, and a drawing expression for visualization is created from the nested expression and displayed. When the Boolean expression is given as a character string expression, the character string expression is converted into a binary tree. COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
Um eine Analyse von Rezensionstexteinheiten bei begrenzter Zeit/begrenzten Ressourcen rationell durchzuführen, wird eine Technik zum rationellen Entnehmen bestimmter Texteinheiten, auf die jemand (ein Analyst) verweisen soll, aus einer großen Anzahl von Rezensionstexteinheiten bereitgestellt. Verfahren zum Entnehmen bestimmter Texteinheiten aus einer Vielzahl von Texteinheiten durch einen Computer, das die folgenden Schritte beinhaltet: erstens Bewerten eines Umfangs positiver Ausdrücke und eines Umfangs negativer Ausdrücke in jeder der Texteinheiten; zweitens Bewerten jeder der Texteinheiten auf der Grundlage einer Vielzahl von Bewertungsfunktionen, wobei zumindest bestimmte Bewertungsfunktionen aus der Vielzahl von Bewertungsfunktionen den Umfang der positiven Ausdrücke und den Umfang der negativen Ausdrücke als Variablen verwenden; und Entnehmen einer Texteinheit, deren Bewertungsergebnis einen höheren Punktwert aufweist, bevorzugt gegenüber einer Texteinheit mit einem niedrigeren Punktwert, wobei die einzelnen Bewertungsergebnisse auf derselben Bewertungsfunktion aus der Vielzahl von Bewertungsfunktionen beruhen.
Abstract:
PROBLEM TO BE SOLVED: To retrieve a document data while appropriately reflecting the content of a retrieving sentence, and to appropriately detect the occurrence of a problem out of document data sequentially added. SOLUTION: This retrieval system retrieves the document data including the content of the retrieving sentence from the plurality of document data, and includes a document database for storing the plurality of document data, a concept database for storing a plurality of concepts by hierarchical structure, a document data concept extraction part for extracting document concept corresponding to document data, based on a keyword included in each document data, a retrieving sentence concept extraction part for extracting a retrieving sentence concept, based on the keyword included in the retrieving sentence, a concept retrieval part for retrieving the document data with the retrieving sentence concept serving as an upper hierarchy or lower hierarchy of the document concept, out of the plurality of document data, and a retrieval result output part for outputting the document data retrieved by the concept retrieval part, as the document data including the content assigned by the retrieving sentence. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a method for efficient document masking. SOLUTION: As a first mode, this method has steps of: decomposing a character string inside a document into partial character strings; calculating a score including appearance frequency in each the partial character string; presenting the score and the partial character string to a user; deciding the partial character string selected by the user; storing the selected partial character string as a safe character string list; and replacing the partial character string inside the document except the partial character string present in the safe character string list with a prescribed replacement character string. COPYRIGHT: (C)2007,JPO&INPIT