Compressing table data using dictionary encoding

    公开(公告)号:GB2506184A

    公开(公告)日:2014-03-26

    申请号:GB201217036

    申请日:2012-09-25

    Applicant: IBM

    Abstract: A computer-implemented method for compressing table data using dictionary encoding comprises the steps: providing at least one table (T) of uncompressed data arranged in columns and rows 2; subdividing the table in at least a first and a second block of complete rows 3; for the first block of rows, determining information about the frequency of occurrence of dif­ferent values for each column 4; evaluating and selecting row(s) to be removed from the first block using frequency of occurrence-information to reduce code-word length 5; removing row(s) from the first block resulting in an updated first block; determining information about the frequency of occurrence for the first updated block; deriving at least one dictionary containing code-words for the first updated block and encoding the values accordingly; and adding the removed row(s) to the second block. The row(s) may be selected using frequency partitioning by dividing values into column partitions according to frequency histograms and forming cells by building the cross-product of the column partitions (fig. 3).

Patent Agency Ranking