Abstract:
A method and computer program product for tracking network activity within a high performance computing environment is disclosed An application may be run in the high performance computing environment and a computation within the application may be performed in parallel on more than one processor. When the application is executed, data is gathered about the performance of hardware devices within the high performance computing environment and the clocking signals are adjusted to a global clock. The temporal data may be processed for a hardware device for a defined time period to develop one or more temporal performance metrics.Additionally, all activities that occur on a hardware device for a given time period can be determined and visualized.
Abstract:
A high performance computing system is provided with an ASIC that communicates with another device in the system according to a protocol defined by the other device. The ASIC is coupled to a reconfigurable protocol table, in the form of a high speed content-addressable memory (“CAM”). The CAM includes instructions to control the execution of the protocol by the ASIC. The CAM may include instructions to control the ASIC in the event that unanticipated signals or other errors are encountered while executing the protocol. Internal ASIC state data may be routed to the CAM to permit the ASIC to generate a reasonable response to errors either in the design or fabrication of the ASIC or the device with which it is communicating.
Abstract:
A fast mount cache is provided by any offline storage media for fast volume mount access. The fast mount cache may be used as the first level in a hierarchical storage configuration after the high performance tier for data having high access rates shortly after creation but decreases sharply as the data ages. The fast mount cache stores migrated data from online hard disk drive storage and maintains the data on a volume basis as opposed to a file basis. As the fast mount cache capacity fills, or other events occur triggering a volume change, the fast mount cache erases the volume having the oldest data. While data is maintained on the fast mount cache for periods of time soon after it is migrated, the data may be accessed quickly. After the initial period of time has expired, the data only exists on tape storage or low tier data.
Abstract:
A method, system and program code for synchronizing scheduler interrupts across multiple nodes of a cluster. Network timers and local scheduling timers are clocked off a system source clock. A processor in each computing node repeatedly reads a network time of day counter. The start of scheduler interrupts is synchronized when the time of day counter is at an integer multiple of a synchronizing integer number of network timer ticks. The processor sends an interprocessor scheduler interrupt to other processors in the node to synchronize scheduling timers in the computing node and throughout the cluster.
Abstract:
Embodiments of the invention include software that provides an operator or a system service the ability to access, control, or configure a plurality of different data center resources using common sets of functions or commands even though those data center resources natively require different commands to access, control, or configure them. The invention is configured to accept common commands and then translate them from a common command format into device specific commands or command sets. The invention simplifies how data center equipment is controlled and configured.
Abstract:
Data state rollover is performed based on data state snapshots and deltas. A series of snapshots is taken of the current data state, an original data state, and data states in between. Deltas are then generated between two sequential snapshots. This results in numerous deltas which represent the difference between consecutive snapshots. Once the deltas are acquired, the deltas may be stored along with the snapshot of the present data state. As such, previous data states may be rolled back to by determining the number of deltas to apply to the current data state to achieve the desired previous data state. In cases where the rollback or rollover fails, deltas may be played against the current data state to a point where the last known trusted and working data point existed.
Abstract:
A high performance computing system is provided with an ASIC that communicates with another device in the system according to a protocol defined by the other device. The ASIC is coupled to a reconfigurable protocol table, in the form of a high speed content-addressable memory (“CAM”). The CAM includes instructions to control the execution of the protocol by the ASIC. The CAM may include instructions to control the ASIC in the event that unanticipated signals or other errors are encountered while executing the protocol. Internal ASIC state data may be routed to the CAM to permit the ASIC to generate a reasonable response to errors either in the design or fabrication of the ASIC or the device with which it is communicating.
Abstract:
Error data is read from error registers and written into a buffer. A computing node uses a BIOS to read the error data, rearm the error register and write the data into a memory mapped buffer. A hub chip supports creation of a shared memory system of computing nodes. A management controller in the computing node extracts error data from the buffer. The error data preferably consists essentially of the error register identifiers and the contents of the error registers. A system management node receives the error data from the management controllers in the computing nodes. The system management node may be coupled to but separate from the computing nodes.
Abstract:
A computer system with read/write access to storage devices creates a snapshot of a data volume at a point in time while continuing to accept access requests to the mirrored data volume by copying before making changes to the base data volume. Multiple snapshots may be made of the same data volume at different points in time. Only data that is not stored in a previous snapshot volume or in the base data volume are stored in the most recent snapshot volume.
Abstract:
Embodiments of the invention relate to a system and method for dynamically scheduling resources using policies to self-optimize resource workloads in a data center. The object of the invention is to allocate resources in the data center dynamically corresponding to a set of policies that are configured by an administrator. Operational parametrics that correlate to the cost of ownership of the data center are monitored and compared to the set of policies configured by the administrator. When the operational parametrics approach or exceed levels that correspond to the set of policies, workloads in the data center are adjusted with the goal of minimizing the cost of ownership of the data center. Such parametrics include yet are not limited to those that relate to resiliency, power balancing, power consumption, power management, error rate, maintenance, and performance.