Abstract:
A cluster of computer system nodes connected by a storage area network include two classes of nodes. The first class of nodes can act as clients or servers, while the other nodes can only be clients. The client-only nodes require much less functionality and can be more easily supported by different operating systems. To minimize the amount of data transmitted during normal operation, the server responsible for maintaining a cluster configuration database repeatedly multicasts the IP address, its incarnation number and the most recent database generation number. Each node stores this information and when a change is detected, each node can request an update of the data needed by that node. A client-only node uses the IP address of the server to connect to the server, to download the information from the cluster database required by the client-only node and to upload local disk connectivity information.
Abstract:
Data visualization that interactively rotates data about a particular axis or translates data in a particular plane based on input received outside the axis space. Data to be visualized is accessed by a data visualization application. The data may be structured or unstructured, filtered and analyzed. The accessed data may be displayed through an interface of the visualization application for a user. The coordinate system for displaying the data may also be displayed. A user may rotate data about a particular axis of the coordinate system or translate data in a particular plane by providing a continuous input within a graphics portion of an interface. The input may be associated with a virtual track ball.
Abstract:
A method and computer program product for tracking network activity within a high performance computing environment is disclosed An application may be run in the high performance computing environment and a computation within the application may be performed in parallel on more than one processor. When the application is executed, data is gathered about the performance of hardware devices within the high performance computing environment and the clocking signals are adjusted to a global clock. The temporal data may be processed for a hardware device for a defined time period to develop one or more temporal performance metrics.Additionally, all activities that occur on a hardware device for a given time period can be determined and visualized.
Abstract:
A high performance computing system is provided with an ASIC that communicates with another device in the system according to a protocol defined by the other device. The ASIC is coupled to a reconfigurable protocol table, in the form of a high speed content-addressable memory (“CAM”). The CAM includes instructions to control the execution of the protocol by the ASIC. The CAM may include instructions to control the ASIC in the event that unanticipated signals or other errors are encountered while executing the protocol. Internal ASIC state data may be routed to the CAM to permit the ASIC to generate a reasonable response to errors either in the design or fabrication of the ASIC or the device with which it is communicating.
Abstract:
A fast mount cache is provided by any offline storage media for fast volume mount access. The fast mount cache may be used as the first level in a hierarchical storage configuration after the high performance tier for data having high access rates shortly after creation but decreases sharply as the data ages. The fast mount cache stores migrated data from online hard disk drive storage and maintains the data on a volume basis as opposed to a file basis. As the fast mount cache capacity fills, or other events occur triggering a volume change, the fast mount cache erases the volume having the oldest data. While data is maintained on the fast mount cache for periods of time soon after it is migrated, the data may be accessed quickly. After the initial period of time has expired, the data only exists on tape storage or low tier data.
Abstract:
A method, system and program code for synchronizing scheduler interrupts across multiple nodes of a cluster. Network timers and local scheduling timers are clocked off a system source clock. A processor in each computing node repeatedly reads a network time of day counter. The start of scheduler interrupts is synchronized when the time of day counter is at an integer multiple of a synchronizing integer number of network timer ticks. The processor sends an interprocessor scheduler interrupt to other processors in the node to synchronize scheduling timers in the computing node and throughout the cluster.
Abstract:
Embodiments of the invention include software that provides an operator or a system service the ability to access, control, or configure a plurality of different data center resources using common sets of functions or commands even though those data center resources natively require different commands to access, control, or configure them. The invention is configured to accept common commands and then translate them from a common command format into device specific commands or command sets. The invention simplifies how data center equipment is controlled and configured.
Abstract:
Data state rollover is performed based on data state snapshots and deltas. A series of snapshots is taken of the current data state, an original data state, and data states in between. Deltas are then generated between two sequential snapshots. This results in numerous deltas which represent the difference between consecutive snapshots. Once the deltas are acquired, the deltas may be stored along with the snapshot of the present data state. As such, previous data states may be rolled back to by determining the number of deltas to apply to the current data state to achieve the desired previous data state. In cases where the rollback or rollover fails, deltas may be played against the current data state to a point where the last known trusted and working data point existed.
Abstract:
A high performance computing system is provided with an ASIC that communicates with another device in the system according to a protocol defined by the other device. The ASIC is coupled to a reconfigurable protocol table, in the form of a high speed content-addressable memory (“CAM”). The CAM includes instructions to control the execution of the protocol by the ASIC. The CAM may include instructions to control the ASIC in the event that unanticipated signals or other errors are encountered while executing the protocol. Internal ASIC state data may be routed to the CAM to permit the ASIC to generate a reasonable response to errors either in the design or fabrication of the ASIC or the device with which it is communicating.
Abstract:
Error data is read from error registers and written into a buffer. A computing node uses a BIOS to read the error data, rearm the error register and write the data into a memory mapped buffer. A hub chip supports creation of a shared memory system of computing nodes. A management controller in the computing node extracts error data from the buffer. The error data preferably consists essentially of the error register identifiers and the contents of the error registers. A system management node receives the error data from the management controllers in the computing nodes. The system management node may be coupled to but separate from the computing nodes.