-
公开(公告)号:GB2520936A
公开(公告)日:2015-06-10
申请号:GB201321286
申请日:2013-12-03
Applicant: IBM
Inventor: HAMPP-BAHNMUELLER THOMAS , JIANG PENG HUI , JIANG PI JUN , XU YAN
IPC: G06F17/30
Abstract: A search query is performed by providing a first data structure containing information about the correlation between a specific search term attribute and at least one block ID of a block being part of a document, identical blocks having the same block ID; providing a second data structure containing information about the correlation between blocks and documents; processing the search query by searching the first data structure for at least one search term attribute and mapping that to the second data structure to retrieve the desired document or documents. The first data structure is preferably an index, each entry being associated with a specific search term attribute and the second data structure is preferably a list of which block is contained in which document.
-
公开(公告)号:GB2513341A
公开(公告)日:2014-10-29
申请号:GB201307333
申请日:2013-04-23
Applicant: IBM
Inventor: BAESSLER MICHAEL , JIANG PENG HUI , JIANG PI JUN
Abstract: Comparison of the quality of matching file formats, so as to store those having higher quality in a file de-deduplication system. A transformation matrix (2C3) indicative of a conversion between the file formats are used, such that on receiving a request to store a first file having a first format, if the second format has a higher quality indicator value than the first format and the second format is convertible to the first format deleting the first file. Otherwise the first file is stored file and on receiving a request to retrieve a file with a file format, if the transformation matrix indicates that the format of the stored file is convertible to the format of the requested file, converting the stored file format to the format of the requested file and sending the converted stored file, or otherwise sending the unconverted stored file. Stored files in different formats may be ranked according to quality, which may be based on sampling rate, resolution, compression ratio and an information richness of the content for a media file or an e-mail format.
-