-
公开(公告)号:AU2005284737B2
公开(公告)日:2011-03-10
申请号:AU2005284737
申请日:2005-09-15
Applicant: IBM
Inventor: ARONOVICH LIOR , HIRSCH MICHAEL , BACHMAT EITAN , BITNER HAIM , KLEIN SHMUEL T , ASHER RON
IPC: G06F17/30
Abstract: A method comprising identifying input data in repository data wherein the repository data comprises repository data chunks and the input data comprise input data chunks and wherein each repository data chunk has a corresponding set of repository data chunk distinguishing characteristics, each distinguishing characteristic being stored with an RDC characteristic location, the method including the steps of, for each input data chunk: determining a set of input data chunk distinguishing characteristics, each distinguishing characteristic having an IDC characteristic location; then comparing the determined set of IDCs to one or more sets of RDCs; identifying a repository data chunk that is similar to the input data chunk as a function of the comparing of the determined set of IDCs to the one or more sets of RDCs, wherein a repository data chunk is identified as similar when a predetermined number of the distinguishing characteristics in the set of IDCs is found to match in a set of RDCs; outputting the IDC and RDC locations of at least one pair of matching IDC and RDC; and computing at least one common section of the input data chunk and the identified similar repository data chunk using the at least one pair of matching IDC and RDC as an anchor to define corresponding intervals in the input data chunk and the identified similar repository data chunk.
-
公开(公告)号:DE112012003503T5
公开(公告)日:2014-09-25
申请号:DE112012003503
申请日:2012-09-10
Applicant: IBM
Inventor: MEIRI EHUD , KLEIN SHMUEL T , TOAFF YAIR , HIRSCH MICHAEL , ASHER RON , ARONOVICH LIOR
Abstract: Es werden beispielhafte Ausführungsformen von Verfahren, Systemen und Computerprogrammprodukten für eine skalierbare Datendeduplizierung bereitgestellt, die mit kleinen Daten-Chunks in einer Datenverarbeitungsumgebung arbeitet. In einer Ausführungsform wird lediglich beispielhaft für jeden der kleinen Daten-Chunks beruhend auf einer Verknüpfung einer Darstellung von in dem kleinen Daten-Chunk vorkommenden Zeichen mit einer Darstellung von Häufigkeiten des kleinen Daten-Chunks eine Signatur erzeugt. Eine Signatur wird beruhend auf einer Verknüpfung einer Darstellung von auftauchenden Zeichen erzeugt. Die Signatur wird dazu verwendet, beim Auswählen der zu deduplizierenden Daten zu helfen. Zusätzliche Ausführungsformen von Systemen und Computerprogrammprodukten werden dargelegt, die zugehörige Vorteile bereitstellen.
-
公开(公告)号:GB2508325A
公开(公告)日:2014-05-28
申请号:GB201406218
申请日:2012-09-10
Applicant: IBM
Inventor: ARONOVICH LIOR , ASHER RON , HIRSCH MICHAEL , KLEIN SHMUEL T , MEIRI EHUD , TOAFF YAIR
Abstract: Exemplary method, system, and computer program product embodiments forscalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each of the small data chunk, a signature is generated based on a combination of a representation of characters that appear in the small data chunkwith a representation of frequencies of the small data chunk. A signature is generated based on a combination of a representation of characters that appear. The signature is used to help in selecting the data to be deduplicated. Additional system and computer program product embodiments are disclosed and provide related advantages.
-
-