Abstract:
PROBLEM TO BE SOLVED: To efficiently re-collect documents which have been collected when a search system is reconfigured. SOLUTION: A document collection system includes: a first storage part for storing system configuration information about a search system; a second storage part for storing attribute information about each collected document and the system configuration information stored in the first storage part at the time of document collection; a comparison part for, when forced re-collection of documents is performed due to reconfiguration of the search system, comparing attribute information about a document to be collected and the system configuration information stored in the first storage part with the attribute information and system configuration information stored in the second storage part; and a document collection part for collecting documents according to a predetermined schedule in normal operation, and re-collecting only documents for which a mismatch is detected by the comparison part in the forced re-collection. COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide: a search system that can achieve a high-quality search of documents including compound words and the like; an index forming unit; a search engine; an index forming method; a search method; and a program. SOLUTION: An index forming unit 104 icnludes: character string analyzing sections 222, 224 which assign a token obtained by character string analysis to a character string extracted from information as a search target; a position defining section 232 which defines a position identification value identifying a position of the token in index information, and asks the assigned token for a correspondent range of the position identification value; and index forming sections 230, 234 which form index data correlating the assigned token, an information identification value for identifying information, and the position identification value of the assigned token and registering them or which form index data associating a token whose support range extends over a plurality of position identification value and registering them while further corresponding to additional information showing the positions of tokens adjacent to the token. COPYRIGHT: (C)2010,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To realize a high-speed search processing in retrieval of a document database accumulating a structured document file. SOLUTION: A search file 31 used for search processing by a search engine 30 and retaining information showing the corresponding relation between a keyword and its position information comprises a key file 32 registering a character string contained in the document file accumulated in the document database 10 and a pointer to the position information related to the character string by document areas in which character strings appear in the document file; and a POS file 33 registering position information including the information for specifying, for each character string registered in the key file 32, a document file in which the character string is present and the information for specifying the position of the character string in the document file. COPYRIGHT: (C)2004,JPO&NCIPI
Abstract:
PROBLEM TO BE SOLVED: To enable highly-accurate retrieval of string information stored in a computer.SOLUTION: A retrieval system 200 includes: a token division part 222 for dividing an input string to be retrieved into one or more tokens; a position definition part 232 for determining tokens registered as those to be excluded in calculating an appearance position to be excluded tokens, determining tokens not to be excluded to be headword tokens, and defining the appearance position with respect to the headword tokens; an information addition part 234 for adding positional information to the excluded tokens, the positional information of which the headword tokens followed by the excluded tokens are set as an original point; an indexing processing part 236 for, with respect to the tokens, identifying whether or not the excluded tokens follow thereto and indexing them. The retrieval system 200 additionally includes a retrieval processing part 250 for extracting a token array matching with not only a retrieval token array included in a phrase retrieval query but also the appearance position and the positional information therein, when retrieval processing considering the excluded tokens is required,.
Abstract:
PROBLEM TO BE SOLVED: To provide a retrieval engine, a retrieval system, a retrieval method, and a program. SOLUTION: A retrieval system 100 is configured to include a server 104, and includes a token assignment unit 222 assigning a plurality of kinds of tokens by applying different character string analyses; an index generation means 220 for generating an index list associating tokens assigned from the unit 222, kind discrimination values for discriminating the kinds of character string analyses, and information; a retrieval means 206 for performing retrieval by receiving retrieval words to check the information, by combining a plurality of kinds of retrieval tokens generated by the retrieval words, and by generating a single retrieval command to perform parallel check of the information; and a retrieval result creation means 214 for displaying the information extracted by the retrieval means 206 in association with the retrieval words through the parallel check and the retrieval tokens so that they can be discriminated. COPYRIGHT: (C)2009,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a retrieval device, a retrieval system, a retrieval method, and a computer program, allowing completion of retrieval for presenting a retrieval result of a document file according with a retrieval condition to a user having authority to perform prescribed processing, in a short time. SOLUTION: The retrieval device 1 includes: a correspondence relation information creation part 201 creating correspondence relation information indicating belonging relation of the user to a group; an index information creation part 202 extracting user identification information associated to group identification information included in authority information imparted to the document file from the correspondence relation information and creating index information including information wherein the extracted user identification information and the document file are associated; and a retrieval part 204 retrieving the document file which accords with the retrieval condition received with input in an input reception part 203 and wherein the user identification information included in the index information and user information received with input accord. COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
PROBLEM TO BE SOLVED: To provide a retrieval technology for giving an appropriate retrieval result while eliminating the omission of retrieval. SOLUTION: When an index used for the retrieval is organized, one document is divided by a token by the use of two system, i.e., morphological analysis and an N-gram system. Boundaries of the whole tokens are calculated based on a plurality of token division results, and the appearance position information of each token and the appearance position information of the subsequent token are calculated, so that indexing is performed. Then, a computer system keeps a token string constituted of the entire token boundaries calculated based on the plurality of token division results in order to restore an original document and to display by emphasis a hit position. In this case, when the keeping is performed, the start position number and end position number of each token are kept together. When the retrieval is performed, a retrieval word inputted by a user are divided by the respective token division methods, and the finished token strings are bonded by OR and retrieved. Since one of the token strings is included in the retrieval result when coincidence is obtained, the omission of retrieval is prevented. COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
The present invention provides an analysis apparatus which calculates the frequency that a word to be analyzed appears in character groups (sentences) in each type of context. The analysis apparatus includes a context storage section that stores context information indicating the positions of character groups in a predetermined context in a document, an index storage section that stores index information indicating the position(s), in the document, of each of a plurality of words included in the document, an input section for inputting a word to be analyzed, a position detection section that detects the position(s) of the word to be analyzed, included in the document from the index information, and a frequency detection section that detects the frequency that the word to be analyzed appears in each type of context in the document, from the position(s) of the word to be analyzed and the context information.
Abstract:
Bei der vorliegenden Erfindung wird berechnet, mit welcher Häufigkeit und in welchem Kontext einer Zeichengruppe (eines Satzes) ein Zielwort auftritt, und eine Analyseeinheit zum Analysieren eines Textdokumentes bereitgestellt, die eine Kontextspeichereinheit zum Speichern von Kontextinformationen, die die Position einer Zeichengruppe mit einem vorbestimmten Kontext im Dokument zeigen, eine Indexspeichereinheit zum Speichern von Indexinformationen, die für jedes Wort einer Vielzahl von im Dokument enthaltenen Wörtern die Position eines Wortes im Dokument zeigen, eine Eingabeeinheit zum Eingeben eines Zielwortes, eine Positionserkennungseinheit zum Erkennen der Position des im Dokument enthaltenen Zielwortes und eine Häufigkeitserkennungseinheit zum Erkennen der Auftrittshäufigkeit des Zielwortes für jede Art von Kontext im Dokument auf der Grundlage der Positionen des Zielwortes und der Kontextinformationen enthält.