Systems and methods for searching and storage of data

    公开(公告)号:AU2005284737B2

    公开(公告)日:2011-03-10

    申请号:AU2005284737

    申请日:2005-09-15

    Applicant: IBM

    Abstract: A method comprising identifying input data in repository data wherein the repository data comprises repository data chunks and the input data comprise input data chunks and wherein each repository data chunk has a corresponding set of repository data chunk distinguishing characteristics, each distinguishing characteristic being stored with an RDC characteristic location, the method including the steps of, for each input data chunk: determining a set of input data chunk distinguishing characteristics, each distinguishing characteristic having an IDC characteristic location; then comparing the determined set of IDCs to one or more sets of RDCs; identifying a repository data chunk that is similar to the input data chunk as a function of the comparing of the determined set of IDCs to the one or more sets of RDCs, wherein a repository data chunk is identified as similar when a predetermined number of the distinguishing characteristics in the set of IDCs is found to match in a set of RDCs; outputting the IDC and RDC locations of at least one pair of matching IDC and RDC; and computing at least one common section of the input data chunk and the identified similar repository data chunk using the at least one pair of matching IDC and RDC as an anchor to define corresponding intervals in the input data chunk and the identified similar repository data chunk.

    Skalierbares Deduplizierungssystem mit kleinen Blöcken

    公开(公告)号:DE112012003503T5

    公开(公告)日:2014-09-25

    申请号:DE112012003503

    申请日:2012-09-10

    Applicant: IBM

    Abstract: Es werden beispielhafte Ausführungsformen von Verfahren, Systemen und Computerprogrammprodukten für eine skalierbare Datendeduplizierung bereitgestellt, die mit kleinen Daten-Chunks in einer Datenverarbeitungsumgebung arbeitet. In einer Ausführungsform wird lediglich beispielhaft für jeden der kleinen Daten-Chunks beruhend auf einer Verknüpfung einer Darstellung von in dem kleinen Daten-Chunk vorkommenden Zeichen mit einer Darstellung von Häufigkeiten des kleinen Daten-Chunks eine Signatur erzeugt. Eine Signatur wird beruhend auf einer Verknüpfung einer Darstellung von auftauchenden Zeichen erzeugt. Die Signatur wird dazu verwendet, beim Auswählen der zu deduplizierenden Daten zu helfen. Zusätzliche Ausführungsformen von Systemen und Computerprogrammprodukten werden dargelegt, die zugehörige Vorteile bereitstellen.

    Scalable deduplication system with small blocks

    公开(公告)号:GB2508325A

    公开(公告)日:2014-05-28

    申请号:GB201406218

    申请日:2012-09-10

    Applicant: IBM

    Abstract: Exemplary method, system, and computer program product embodiments forscalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each of the small data chunk, a signature is generated based on a combination of a representation of characters that appear in the small data chunkwith a representation of frequencies of the small data chunk. A signature is generated based on a combination of a representation of characters that appear. The signature is used to help in selecting the data to be deduplicated. Additional system and computer program product embodiments are disclosed and provide related advantages.

Patent Agency Ranking