Method and system for performing search queries using and building a block-level index

    公开(公告)号:GB2520936A

    公开(公告)日:2015-06-10

    申请号:GB201321286

    申请日:2013-12-03

    Applicant: IBM

    Abstract: A search query is performed by providing a first data structure containing information about the correlation between a specific search term attribute and at least one block ID of a block being part of a document, identical blocks having the same block ID; providing a second data structure containing information about the correlation between blocks and documents; processing the search query by searching the first data structure for at least one search term attribute and mapping that to the second data structure to retrieve the desired document or documents. The first data structure is preferably an index, each entry being associated with a specific search term attribute and the second data structure is preferably a list of which block is contained in which document.

    Method and system for data de-duplication

    公开(公告)号:GB2513341A

    公开(公告)日:2014-10-29

    申请号:GB201307333

    申请日:2013-04-23

    Applicant: IBM

    Abstract: Comparison of the quality of matching file formats, so as to store those having higher quality in a file de-deduplication system. A transformation matrix (2C3) indicative of a conversion between the file formats are used, such that on receiving a request to store a first file having a first format, if the second format has a higher quality indicator value than the first format and the second format is convertible to the first format deleting the first file. Otherwise the first file is stored file and on receiving a request to retrieve a file with a file format, if the transformation matrix indicates that the format of the stored file is convertible to the format of the requested file, converting the stored file format to the format of the requested file and sending the converted stored file, or otherwise sending the unconverted stored file. Stored files in different formats may be ranked according to quality, which may be based on sampling rate, resolution, compression ratio and an information richness of the content for a media file or an e-mail format.

Patent Agency Ranking