Data value occurence information for data compression

    公开(公告)号:GB2490068A

    公开(公告)日:2012-10-17

    申请号:GB201213200

    申请日:2010-12-07

    Applicant: IBM

    Abstract: Generation of occurrence data of data values is discussed, for enabling encoding of a data set. Occurrences of data values in a current data batch are determined. Occurrence count information for at most a first number (M) of most frequent data values in the current data batch are determined, the occurrence count information identifying the most frequent data values and their occurrence counts. For rest of the data values in the current data batch, at least a first histogram having a second number (N) of buckets is generated. The occurrence count information and the first histogram of the current data batch are merged to merged occurrence count information and merged histogram of data batches processed earlier. A next data batch is processed as a current data batch until the whole data set has been processed. An encoding scheme is determined based at least on the merged occurrence count information and the merged histogram corresponding to the data set.

Patent Agency Ranking