Abstract:
The present invention enables capturing API level calls using a combination of dynamic instrumentation and library overriding. The invention allows event level tracing of API function calls and returns, and is able to generate an execution trace. The instrumentation is lightweight and relies on dynamic library/shared library linking mechanisms in most operating systems. Hence we need no source code modification or binary injection. The tool can be used to capture parameter values, and return values, which can be used to correlate traces across API function calls to generate transaction flow logic.
Abstract:
Methods and systems for performance inference include inferring an internal application status based on a unified call stack trace that includes both user and kernel information by inferring user function instances. A calling context encoding is generated that includes information regarding function calling paths. The analysis includes performing a top-down latency breakdown and ranking calling contexts according to how costly each function calling path is.
Abstract:
A debugging system used for a data center in a network is disclosed. The system includes a monitoring engine to monitor network traffic by collecting traffic information from a network controller, a modeling engine to model an application signature, an infrastructure signature, and a task signature using a monitored log, a debugging engine to detect a change in the application signature between a working status and a non-working status using a reference log and a problem log, and to validate the change using the task signature, and a providing unit to provide toubleshooting information, wherein an unknown change in the application signature is correlated to a known problem class by considering a dependency to a change in the infrastructure signature. Other methods and systems also are disclosed.
Abstract:
A method and system for coordinating energy management in a virtualized data center including a plurality of physical servers and a plurality of virtual machines (VMs), includes analyzing status information about the virtualized data center; determining server utilization target settings for server consolidation from the analyzed status information; and executing the server consolidation according to the determined server utilization target settings. Server consolidation can be executed by determining an effective size of each of the VMs and placing the VMs on the servers in a selective manner using an independent workload VM placement process, a correlation-aware VM placement process, or a migration-cost and correlation-aware VM placement process.
Abstract:
A method system for diagnosing a detected failure in a computer system, compares a failure signature of the detected failure to an archived failure signature contained in a database to determine if the archived failure signature matches the failure signature of the detected failure. If the archived failure signature matches the failure signature of the detected failure, an archived solution is applied to the computer system that resolves the detected failure, the archived solution corresponding to a solution used to resolve a previously detected computer system failure corresponding to the archived failure signature in the database that matches the detected failure.
Abstract:
A system and method for prioritizing alerts includes extracting invariants to determine a stable set of models for determining relationships among monitored system data. Equivalent thresholds for a plurality of rules are computed using an invariant network developed by extracting the invariants. For a given time window, a set of alerts are received from a system being monitored. A measurement value of the alerts is compared with a vector of equivalent thresholds, and the set of alerts is ranked.
Abstract:
A computer-implemented method executed on a processor (214) for automatically analyzing log contents received via a network (803) and detecting content-level anomalies is presented. The computer-implemented method includes building a statistical model (103) based on contents of a set of training logs and detecting, based on the set of training logs, content-level anomalies (106) for a set of testing logs. The method further includes maintaining an index and metadata, generating attributes for fields, editing model capability to incorporate user domain knowledge, detecting anomalies using field attributes, and improving anomaly quality by using user feedback (107).
Abstract:
Methods for system failure prediction include clustering log files according to structural log patterns. Feature representations of the log files are determined based on the log clusters. A likelihood of a system failure is determined based on the feature representations using a neural network. An automatic system control action is performed if the likelihood of system failure exceeds a threshold.
Abstract:
Methods and systems for reporting anomalous events include building a process graph that models states of process-level events in a network. A topology graph is built that models source and destination relationships between connection events in the network. A set of alerts is clustered based on the process graph and the topology graph. Clustered alerts that exceed a threshold level of trustworthiness are reported.
Abstract:
A system, program, and method for detecting anomalies in heterogeneous logs. The system having a processor configured to identify pattern fields comprised of a plurality of event identifiers. The processor is further configured to generate an automata model by profiling event behaviors of the plurality of event sequences, the plurality of event sequences grouped in the automata model by combinations of one or more pattern fields and one or more event identifiers from among the plurality of event identifiers, wherein for a given combination, the one or more event identifiers therein must be respectively comprised in a same one of the one or more pattern fields with which it is combined. The processor is also configured to detect an anomaly in one of the plurality of event sequences using the automata model. The processor is additionally configured to control an anomaly-initiating one of the network devices based on the anomaly.