Data de-duplication between emulated disk sub-systems

    公开(公告)号:GB2510185A

    公开(公告)日:2014-07-30

    申请号:GB201301542

    申请日:2013-01-29

    Applicant: IBM

    Abstract: Two or more disk emulators 106, 108 operate in parallel, each emulating a disk subsystem. Each emulator uses a respective file in a file system for data stored on disk 119, i.e. a disk-image file 112, 114. A separate de-duplicator 118 operates in parallel to the foregoing emulators and uses an additional disk emulator 109 that emulates an additional disk subsystem. The additional emulator uses an additional disk-image file 116 in a file system for storing data shared between the other disk subsystems. Preferably, the de-duplicator uses one or more virtual block-mapping tables to store and retrieve data in the additional file system (fig. 2). The additional file is accessible by all disk emulators. Duplicate data is identified in the respective files, retrieved and stored in the additional file. Duplicate data may be identified using block-by-block comparison, file-aware block comparison or block hashing. In an atomic operation, the duplicate data is deleted in its respective file and replaced with a reference to the respective duplicate data stored in the additional file.

Patent Agency Ranking