System and program for collecting document
    2.
    发明专利
    System and program for collecting document 有权
    收集文件的系统和程序

    公开(公告)号:JP2011039884A

    公开(公告)日:2011-02-24

    申请号:JP2009187980

    申请日:2009-08-14

    CPC classification number: G06F17/30864

    Abstract: PROBLEM TO BE SOLVED: To efficiently re-collect documents which have been collected when a search system is reconfigured.
    SOLUTION: A document collection system includes: a first storage part for storing system configuration information about a search system; a second storage part for storing attribute information about each collected document and the system configuration information stored in the first storage part at the time of document collection; a comparison part for, when forced re-collection of documents is performed due to reconfiguration of the search system, comparing attribute information about a document to be collected and the system configuration information stored in the first storage part with the attribute information and system configuration information stored in the second storage part; and a document collection part for collecting documents according to a predetermined schedule in normal operation, and re-collecting only documents for which a mismatch is detected by the comparison part in the forced re-collection.
    COPYRIGHT: (C)2011,JPO&INPIT

    Abstract translation: 要解决的问题:为了有效地重新收集在重新配置搜索系统时收集的文档。 解决方案:文档收集系统包括:第一存储部分,用于存储关于搜索系统的系统配置信息; 用于存储关于每个收集的文档的属性信息和在文档收集时存储在第一存储部分中的系统配置信息的第二存储部件; 比较部分,当由于搜索系统的重新配置而执行强制重新收集文档时,将要收集的文档的属性信息和存储在第一存储部分中的系统配置信息与属性信息和系统配置信息进行比较 存储在第二存储部分中; 以及用于在正常操作中根据预定时间表收集文档的文档收集部分,并且仅重新收集在强制重新收集中比较部分检测到不匹配的文档。 版权所有(C)2011,JPO&INPIT

    Search system, index forming unit, search engine, index forming method, search method, and program
    3.
    发明专利
    Search system, index forming unit, search engine, index forming method, search method, and program 有权
    搜索系统,索引成型单元,搜索引擎,索引形成方法,搜索方法和程序

    公开(公告)号:JP2009252064A

    公开(公告)日:2009-10-29

    申请号:JP2008101031

    申请日:2008-04-09

    Abstract: PROBLEM TO BE SOLVED: To provide: a search system that can achieve a high-quality search of documents including compound words and the like; an index forming unit; a search engine; an index forming method; a search method; and a program. SOLUTION: An index forming unit 104 icnludes: character string analyzing sections 222, 224 which assign a token obtained by character string analysis to a character string extracted from information as a search target; a position defining section 232 which defines a position identification value identifying a position of the token in index information, and asks the assigned token for a correspondent range of the position identification value; and index forming sections 230, 234 which form index data correlating the assigned token, an information identification value for identifying information, and the position identification value of the assigned token and registering them or which form index data associating a token whose support range extends over a plurality of position identification value and registering them while further corresponding to additional information showing the positions of tokens adjacent to the token. COPYRIGHT: (C)2010,JPO&INPIT

    Abstract translation: 要解决的问题:提供:可以实现对包括复合词等的文档的高质量搜索的搜索系统; 指标形成单元; 搜索引擎; 指标形成方法; 搜索方法; 和一个程序。 索引形成单元104包括:字符串分析部分222,224,其将从字符串分析获得的令牌分配给从信息提取的字符串作为搜索目标; 位置定义部分232,其定义标识令牌在索引信息中的位置的位置标识值,并且对所分配的令牌询问位置识别值的相应范围; 以及索引形成部分230,234,其形成与所分配的令牌相关联的索引数据,用于识别信息的信息标识值和所分配的令牌的位置标识值,并且注册它们,或者哪个形成索引数据,该索引数据将支持范围在 多个位置识别值并且在进一步对应于示出与令牌相邻的令牌的位置的附加信息的同时进行注册。 版权所有(C)2010,JPO&INPIT

    Database search system and search method, method for forming data file used for search, and storage medium storing data file
    4.
    发明专利
    Database search system and search method, method for forming data file used for search, and storage medium storing data file 有权
    数据库搜索系统和搜索方法,用于形成用于搜索的数据文件的方法和存储媒体存储数据文件

    公开(公告)号:JP2004220176A

    公开(公告)日:2004-08-05

    申请号:JP2003004572

    申请日:2003-01-10

    CPC classification number: G06F17/30619 Y10S707/99931 Y10S707/99933

    Abstract: PROBLEM TO BE SOLVED: To realize a high-speed search processing in retrieval of a document database accumulating a structured document file.
    SOLUTION: A search file 31 used for search processing by a search engine 30 and retaining information showing the corresponding relation between a keyword and its position information comprises a key file 32 registering a character string contained in the document file accumulated in the document database 10 and a pointer to the position information related to the character string by document areas in which character strings appear in the document file; and a POS file 33 registering position information including the information for specifying, for each character string registered in the key file 32, a document file in which the character string is present and the information for specifying the position of the character string in the document file.
    COPYRIGHT: (C)2004,JPO&NCIPI

    Abstract translation: 要解决的问题:实现对累积结构化文档文件的文档数据库的检索中的高速搜索处理。 解决方案:用于搜索引擎30进行搜索处理的搜索文件31和保持关于关键字与其位置信息之间的对应关系的信息包括:密钥文件32,其中记录包含在文档中的文档文件中 数据库10和指向文档文件中出现字符串的文档区域与字符串相关的位置信息的指针; 以及POS文件33,其登记位置信息,该位置信息包括用于指定登记在密钥文件32中的每个字符串的信息,其中存在字符串的文件文件和用于指定文档文件中的字符串的位置的信息 。 版权所有(C)2004,JPO&NCIPI

    Retrieval system, index preparation apparatus, retrieval device, index preparation method, retrieval method, and program
    5.
    发明专利
    Retrieval system, index preparation apparatus, retrieval device, index preparation method, retrieval method, and program 审中-公开
    检索系统,索引编制装置,检索装置,索引编制方法,检索方法和程序

    公开(公告)号:JP2013015967A

    公开(公告)日:2013-01-24

    申请号:JP2011147417

    申请日:2011-07-01

    CPC classification number: G06F17/30637 G06F17/271

    Abstract: PROBLEM TO BE SOLVED: To enable highly-accurate retrieval of string information stored in a computer.SOLUTION: A retrieval system 200 includes: a token division part 222 for dividing an input string to be retrieved into one or more tokens; a position definition part 232 for determining tokens registered as those to be excluded in calculating an appearance position to be excluded tokens, determining tokens not to be excluded to be headword tokens, and defining the appearance position with respect to the headword tokens; an information addition part 234 for adding positional information to the excluded tokens, the positional information of which the headword tokens followed by the excluded tokens are set as an original point; an indexing processing part 236 for, with respect to the tokens, identifying whether or not the excluded tokens follow thereto and indexing them. The retrieval system 200 additionally includes a retrieval processing part 250 for extracting a token array matching with not only a retrieval token array included in a phrase retrieval query but also the appearance position and the positional information therein, when retrieval processing considering the excluded tokens is required,.

    Abstract translation: 要解决的问题:使得能够高精度地检索存储在计算机中的字符串信息。 解决方案:检索系统200包括:令牌分割部分222,用于将要检索的输入字符串划分成一个或多个令牌; 位置定义部分232,用于确定在计算排除令牌的外观位置时登记为不排除的令牌,确定不被排除的令牌作为标题令牌,并且定义关于标题令牌的外观位置; 用于将排除的标记添加位置信息的信息添加部分234设置为排除令牌后面的头条标记的位置信息作为原始点; 相对于令牌的索引处理部分236,用于识别排除的令牌是否跟随它们并对其进行索引。 检索系统200另外包括检索处理部分250,用于提取与包括在短语检索查询中的检索令牌阵列匹配的令牌阵列,还包括当考虑到排除的令牌的检索处理时的外观位置和位置信息 ,。 版权所有(C)2013,JPO&INPIT

    Retrieval engine, retrieval system, retrieval method, and program
    6.
    发明专利
    Retrieval engine, retrieval system, retrieval method, and program 有权
    检索引擎,检索系统,检索方法和程序

    公开(公告)号:JP2009205397A

    公开(公告)日:2009-09-10

    申请号:JP2008046582

    申请日:2008-02-27

    CPC classification number: G06F17/30616 G06F17/30634

    Abstract: PROBLEM TO BE SOLVED: To provide a retrieval engine, a retrieval system, a retrieval method, and a program. SOLUTION: A retrieval system 100 is configured to include a server 104, and includes a token assignment unit 222 assigning a plurality of kinds of tokens by applying different character string analyses; an index generation means 220 for generating an index list associating tokens assigned from the unit 222, kind discrimination values for discriminating the kinds of character string analyses, and information; a retrieval means 206 for performing retrieval by receiving retrieval words to check the information, by combining a plurality of kinds of retrieval tokens generated by the retrieval words, and by generating a single retrieval command to perform parallel check of the information; and a retrieval result creation means 214 for displaying the information extracted by the retrieval means 206 in association with the retrieval words through the parallel check and the retrieval tokens so that they can be discriminated. COPYRIGHT: (C)2009,JPO&INPIT

    Abstract translation: 要解决的问题:提供检索引擎,检索系统,检索方法和程序。 解决方案:检索系统100被配置为包括服务器104,并且包括令牌分配单元222,其通过应用不同的字符串分析来分配多种令牌; 用于生成与从单元222分配的令牌相关联的索引列表的索引生成装置220,用于区分字符串分析的种类和信息的种类识别值; 检索装置206,用于通过组合由检索词产生的多种检索令牌并通过生成单个检索命令来执行对该信息的并行检查来执行通过接收检索词来检索信息的检索词; 以及检索结果创建装置214,用于通过并行检查和检索令牌显示由检索装置206提取的信息与检索词相关联的检索结果,以便可以区分它们。 版权所有(C)2009,JPO&INPIT

    Retrieval device, retrieval system, retrieval method, and computer program for retrieving document file stored in storage device
    7.
    发明专利
    Retrieval device, retrieval system, retrieval method, and computer program for retrieving document file stored in storage device 审中-公开
    检索设备,检索系统,检索方法和用于检索存储设备中存储的文件的计算机程序

    公开(公告)号:JP2011133928A

    公开(公告)日:2011-07-07

    申请号:JP2009290045

    申请日:2009-12-22

    CPC classification number: G06F21/6218 G06F2221/2141

    Abstract: PROBLEM TO BE SOLVED: To provide a retrieval device, a retrieval system, a retrieval method, and a computer program, allowing completion of retrieval for presenting a retrieval result of a document file according with a retrieval condition to a user having authority to perform prescribed processing, in a short time.
    SOLUTION: The retrieval device 1 includes: a correspondence relation information creation part 201 creating correspondence relation information indicating belonging relation of the user to a group; an index information creation part 202 extracting user identification information associated to group identification information included in authority information imparted to the document file from the correspondence relation information and creating index information including information wherein the extracted user identification information and the document file are associated; and a retrieval part 204 retrieving the document file which accords with the retrieval condition received with input in an input reception part 203 and wherein the user identification information included in the index information and user information received with input accord.
    COPYRIGHT: (C)2011,JPO&INPIT

    Abstract translation: 要解决的问题:提供检索装置,检索系统,检索方法和计算机程序,允许完成检索,以将具有检索条件的文件文件的检索结果呈现给具有权限的用户 在短时间内执行规定的处理。 解决方案:检索装置1包括:对应关系信息创建部分201,创建指示用户与组的归属关系的对应关系信息; 索引信息创建部分202从对应关系信息中提取与赋予文档文件的权限信息中包括的组标识信息相关联的用户标识信息,并创建包括所提取的用户识别信息和文档文件相关联的信息的索引信息; 以及检索部分204,其在输入接收部分203中检索符合接收到的检索条件的文档文件,并且其中包括在索引信息中的用户标识信息和输入所接收的用户信息一致。 版权所有(C)2011,JPO&INPIT

    Information retrieval system, method and program, and index generation system, method, and program
    8.
    发明专利
    Information retrieval system, method and program, and index generation system, method, and program 有权
    信息检索系统,方法和程序,以及索引生成系统,方法和程序

    公开(公告)号:JP2010250389A

    公开(公告)日:2010-11-04

    申请号:JP2009096383

    申请日:2009-04-10

    Abstract: PROBLEM TO BE SOLVED: To provide a retrieval technology for giving an appropriate retrieval result while eliminating the omission of retrieval. SOLUTION: When an index used for the retrieval is organized, one document is divided by a token by the use of two system, i.e., morphological analysis and an N-gram system. Boundaries of the whole tokens are calculated based on a plurality of token division results, and the appearance position information of each token and the appearance position information of the subsequent token are calculated, so that indexing is performed. Then, a computer system keeps a token string constituted of the entire token boundaries calculated based on the plurality of token division results in order to restore an original document and to display by emphasis a hit position. In this case, when the keeping is performed, the start position number and end position number of each token are kept together. When the retrieval is performed, a retrieval word inputted by a user are divided by the respective token division methods, and the finished token strings are bonded by OR and retrieved. Since one of the token strings is included in the retrieval result when coincidence is obtained, the omission of retrieval is prevented. COPYRIGHT: (C)2011,JPO&INPIT

    Abstract translation: 要解决的问题:提供一种用于提供适当检索结果的检索技术,同时消除省略检索。 解决方案:当用于检索的索引被组织时,通过使用两个系统,即形态分析和N-gram系统,通过令牌来划分一个文档。 基于多个令牌划分结果计算整个令牌的边界,并且计算每个令牌的出现位置信息和后续令牌的出现位置信息,从而执行索引。 然后,计算机系统保持由基于多个令牌分割结果计算出的整个令牌边界构成的令牌串,以便恢复原始文档并且通过强调显示命中位置。 在这种情况下,当执行保持时,每个令牌的起始位置号码和结束位置号码保持在一起。 当执行检索时,由用户输入的检索词通过相应的令牌划分方法进行划分,并且通过OR绑定完成的令牌字符串并检索。 由于当获得符合时,其中一个令牌串被包括在检索结果中,所以省略了检索。 版权所有(C)2011,JPO&INPIT

    Apparatus for analyzing text document, program, and method

    公开(公告)号:GB2511015A

    公开(公告)日:2014-08-20

    申请号:GB201410245

    申请日:2013-01-11

    Applicant: IBM

    Abstract: The present invention provides an analysis apparatus which calculates the frequency that a word to be analyzed appears in character groups (sentences) in each type of context. The analysis apparatus includes a context storage section that stores context information indicating the positions of character groups in a predetermined context in a document, an index storage section that stores index information indicating the position(s), in the document, of each of a plurality of words included in the document, an input section for inputting a word to be analyzed, a position detection section that detects the position(s) of the word to be analyzed, included in the document from the index information, and a frequency detection section that detects the frequency that the word to be analyzed appears in each type of context in the document, from the position(s) of the word to be analyzed and the context information.

    Einheit, Programm und Verfahren zum Analysieren von Textdokumenten

    公开(公告)号:DE112013000981T5

    公开(公告)日:2014-11-27

    申请号:DE112013000981

    申请日:2013-01-11

    Applicant: IBM

    Abstract: Bei der vorliegenden Erfindung wird berechnet, mit welcher Häufigkeit und in welchem Kontext einer Zeichengruppe (eines Satzes) ein Zielwort auftritt, und eine Analyseeinheit zum Analysieren eines Textdokumentes bereitgestellt, die eine Kontextspeichereinheit zum Speichern von Kontextinformationen, die die Position einer Zeichengruppe mit einem vorbestimmten Kontext im Dokument zeigen, eine Indexspeichereinheit zum Speichern von Indexinformationen, die für jedes Wort einer Vielzahl von im Dokument enthaltenen Wörtern die Position eines Wortes im Dokument zeigen, eine Eingabeeinheit zum Eingeben eines Zielwortes, eine Positionserkennungseinheit zum Erkennen der Position des im Dokument enthaltenen Zielwortes und eine Häufigkeitserkennungseinheit zum Erkennen der Auftrittshäufigkeit des Zielwortes für jede Art von Kontext im Dokument auf der Grundlage der Positionen des Zielwortes und der Kontextinformationen enthält.

Patent Agency Ranking