Abstract:
A system and method for allowing more rapid takeover of a failed filer by a clustered takeover partner filer in the presence of a coredump procedure (e.g. a transfer of the failed filer's working memory) is provided. To save time, the coredump is allowed to occur contemporaneously with the takeover of the failed filer’s regular, active file service disks by the partner so that the takeover need not await completion of the coredump to begin. This is accomplished, briefly stated, by the following techniques. The coredump is written to a single disk that is not involved in regular file service, so that takeover of regular file services can proceed without interference from coredump. A reliable means for both filers in a cluster to identify the coredump disk is provided, which removes takeover dependence upon unreliable communications mechanisms.
Abstract:
This invention provides a system and method for selecting and communicating a single disk (a "coredump disk") for use in a coredump procedure by a failed file server (or filer). A selection method on the failed filer determines the "best candidate" coredump disk according to a predetermined set of criteria. For example, the available disks that can receive coredump data are located and ordered so as to prefer disks that best match the coredump data size requirement, are least likely to be needed for normal service by the server; and require the least preparation to receive coredump data. Appropriate attributes on the selected coredump disk are written to indicate that a coredump is in progress, and the location of the coredump data. Upon reboot of the failed filer (or takeover by a cluster partner), the coredump disk is identified and the coredump data recovered by reading back appropriate attributes.
Abstract:
A system and method for allowing more rapid takeover of a failed filer by a clustered takeover partner filer in the presence of a coredump procedure (e.g. a transfer of the failed filer's working memory) is provided. To save time, the coredump is allowed to occur contemporaneously with the takeover of the failed filer’s regular, active file service disks by the partner so that the takeover need not await completion of the coredump to begin. This is accomplished, briefly stated, by the following techniques. The coredump is written to a single disk that is not involved in regular file service, so that takeover of regular file services can proceed without interference from coredump. A reliable means for both filers in a cluster to identify the coredump disk is provided, which removes takeover dependence upon unreliable communications mechanisms.
Abstract:
A system and method for allowing more rapid takeover of a failed filer by a clustered takeover partner filer in the presence of a coredump procedure (e.g. a transfer of the failed filer's working memory) is provided. To save time, the coredump is allowed to occur contemporaneously with the takeover of the failed filer’s regular, active file service disks by the partner so that the takeover need not await completion of the coredump to begin. This is accomplished, briefly stated, by the following techniques. The coredump is written to a single disk that is not involved in regular file service, so that takeover of regular file services can proceed without interference from coredump. A reliable means for both filers in a cluster to identify the coredump disk is provided, which removes takeover dependence upon unreliable communications mechanisms.