Abstract:
A computer system is disclosed that involves multiple communicatively interconnected computers, a Monitoring and Control Program (MCP) on each node, wherein each MCP is communicatively interconnected to other MCPs, wherein at least one of the MCPs acts as a controlling MCP, wherein the controlling MCP will execute, and communicate, operating system-independent MCP control language commands to other MCPs to at least cause the other MCPs to monitor execution of the transactions of the production workloads across the nodes, on a per-transaction basis, with each MCP monitoring individual transaction execution on its node in real-time, and wherein, in conjunction with the monitoring, the MCPs will collectively generate a transaction table, on a node and transaction basis, detailing parametric information regarding the execution of the transactions across the nodes, with at least one of the MCPs effecting storage of the transaction table.
Abstract:
A method for archiving monitoring data by enabling real-time analysis within a live database. A processor receives a set of parameters, which is selected based on monitoring data of a system application. The processor determines a historical schema of monitoring data and a retention period for a current schema of monitoring data, based on the set of parameters. The processor performs an impact analysis of the historical schema and retention period of the monitoring data. Upon acceptance of the impact analysis, the processor generates the historical schema which is applied to a table of monitoring data, populated by copying monitoring data from the current schema to the historical schema, within the live database. The processor removes monitoring data exceeding the rolling retention period from both the current schema and historical schema, based on the rolling retention period of the current schema.
Abstract:
Event counter checkpointing and restoring is disclosed. In one implementation, a processor includes a first event counter to count events that occur during execution within the processor, event counter checkpoint logic, communicably coupled with the first event counter, to store, prior to a transactional execution of the processor, a value of the first event counter, a second event counter to count events prior to and during the transactional execution, wherein the second event counter is to increment without resetting after the transactional execution is aborted, event count restore logic to restore the first event counter to the stored value after the transactional execution is aborted, and tuning logic to determine, in response to aborting of the transactional execution, a number of the events that occurred during the transactional execution based on the stored value of the first event counter and a value of the second event counter.
Abstract:
Embodiments of the invention relate to implementing run-time instrumentation sampling in transactional-execution mode. An aspect of the invention includes a method for implementing run-time instrumentation sampling in transactional-execution mode. The method includes determining, by a processor, that the processor is configured to execute instructions of an instruction stream in a transactional-execution mode, the instructions defining a transaction. The method also includes interlocking completion of storage operations of the instructions to prevent instruction-directed storage until completion of the transaction. The method further includes recognizing a sample point during execution of the instructions while in the transactional-execution mode. The method additionally includes run-time-instrumentation-directed storing, upon successful completion of the transaction, run-time instrumentation information obtained at the sample point.
Abstract:
In various embodiments, methods and systems for implementing multiple transaction logs in a distributed storage system are provided. A log stream component detects performance metrics of a plurality of log streams. The performance metrics are associated with requests from partitions in the distributed storage system. A transaction component receives a request to execute a transaction using a log stream. The request is received from a partition of the distributed storage system. The performance metrics of the plurality of log streams can be referenced, where the performance metrics indicate a performance capacity of a selected log stream to process the request. A log stream for executing the transaction is determined based on the performance capacity. The log stream selected can also factor request attributes of the request. The transaction component communicates the request to be executed, using the log stream to perform the transaction.
Abstract:
A method for automated detection of a real IT system problem may include obtaining monitor measurements of metrics associated with activities of a plurality of configuration items of the IT system. The method may also include detecting anomalies in the monitor measurements. The method may further include grouping concurrent anomalies of the detected anomalies corresponding to configuration items of the plurality of configuration items which are topologically linked to be regarded as a system anomaly. The method may further include calculating a significance score for the system anomaly, and determining that the system anomaly relates to a real system problem based on the calculated significance score.
Abstract:
An application which utilizes a single thread is monitored and context for the individual requests and business transactions operating on that platform are provided. A wrapper is placed is around an object that calls a request. The wrapper renames a request object. Request objects are renamed with a unique name when they are called. When a call stack is sampled, the sampler will retrieve the unique name. Performance data associated with the unique name may be correlated to a business transaction and particular request as metrics are subsequently analyzed. For subsequent reporting, such as call graph reporting, the report or call graph will have requests in the context of a particular business transaction and a particular request instance. This provides more context in reporting of an business application request for frameworks that utilize as single request for multiple threads.
Abstract:
Systems and methods for identifying failed customer experience in distributed computer systems. An example method may comprise: receiving, by a processing device of a distributed computer system, a first application layer message associated with a request originated by a client computer system responsive to an action by a user, wherein the first application layer message comprises a transaction identifier identifying a sequence of messages originated by one or more components of the distributed computer system and associated with the request; identifying a pre-defined byte pattern comprised by the first application layer message; and identifying, based on the pre-defined byte pattern, at least one of: a system error associated with the transaction or an application error associated with the transaction.
Abstract:
A system monitors a network or web application provided by one or more distributed applications and provides data for each and every method instance in an efficient low-cost manner. The web application may be provided by one or more web services each implemented as a virtual machine or one or more applications implemented on a virtual machine. Agents may be installed on one or more servers at an application level, virtual machine level, or other level. The agent may identify one or more hot spot methods based on current or past performance, functionality, content, or business relevancy. Based on learning techniques, efficient monitoring, and resource management, the present system may capture data for and provide analysis information for outliers of a web application with very low overhead.
Abstract:
Provided are methods and computer program products for generating a model of network application health. Methods may include receiving activity data that corresponds to activities of multiple applications that are operable to execute on at least one networked device, and combining the received activity data to remove redundant portions thereof and/or to reconcile inconsistencies therein. Based on the received activity data, ones of the multiple applications are identified, and relationships between the identified applications are determined. A model is generated including the identified applications and the relationships therebetween, and a representation of the model is displayed. Related computer program products are also provided.