Abstract:
Methods and systems for reporting anomalous events include building a process graph that models states of process-level events in a network. A topology graph is built that models source and destination relationships between connection events in the network. A set of alerts is clustered based on the process graph and the topology graph. Clustered alerts that exceed a threshold level of trustworthiness are reported.
Abstract:
A system and method for optimizing system performance includes applying (160) sampling based optimization to identify optimal configurations of a computing system by selecting (162) a number of configuration samples and evaluating (166) system performance based on the samples. Based on feedback of evaluated samples, a location of an optimal configuration is inferred (170). Additional samples are generated (176) towards the location of the inferred optimal configuration to further optimize a system configuration.
Abstract:
A computer-implemented method, system, and computer program product are provided for anomaly detection system in streaming networks. The method includes receiving (810), by a processor, a plurality of vertices and edges from a streaming graph. The method also includes generating (820), by the processor, graph codes for the plurality of vertices and edges. The method additionally includes determining (830), by the processor, edge codes in real-time responsive to the graph codes. The method further includes identifying (840), by the processor, an anomaly based on a distance between edge codes and all current cluster centers. The method also includes controlling (850) an operation of a processor-based machine to change a state of the processor-based machine, responsive to the anomaly.
Abstract:
Disclosed is a method and apparatus for performing capacity planning and resource optimization in a distributed system. In particular, the capacity needs of individual components (e.g., server, operating system, CPU, application software, memory, networking device, storage device, etc.) in a distributed system can±>e analyzed using relationships between measurements collected from the distributed system. These relationships, called invariants, do not change over time. From these measurements, a network of invariants are determined. The network of invariants characterize the relationships between the measurements. The capacity need of at least one component in the distributed system can be determined from the network of invariants.
Abstract:
Methods and systems for reporting anomalous events include intra-host clustering a set of alerts based on a process graph that models states of process-level events in a network. Hidden relationship clustering is performed on the intra-host clustered alerts based on hidden relationships between alerts in respective clusters. Inter-host clustering is performed on the hidden relationship clustered alerts based on a topology graph that models source and destination relationships between connection events in the network. Inter-host clustered alerts that exceed a threshold level of trustworthiness are reported.
Abstract:
A method and system that automatically derives models between monitored quantities under non-faulty conditions so that subsequent faults can be detected as deviations from the derived models. The invention identifies unusual conditions for fault detection and isolation that is absent in rule-based systems.
Abstract:
A computer-implemented method for real-time detecting of abnormal network connections is presented. The computer-implemented method includes collecting network connection events from at least one agent connected to a network, recording, via a topology graph, normal states of network connections among hosts in the network, and recording, via a port graph, relationships established between host and destination ports of all network connections.
Abstract:
A method and system are provided. The method includes performing (320), by a logs-to-time-series converter, a logs-to-time-series conversion by transforming a plurality of heterogeneous logs into a set of time series. Each of the heterogeneous logs includes a time stamp and text portion with one or more fields. The method further includes performing (330), by a time-series-to-sequential-pattern converter, a time-series-to-sequential-pattern conversion by mining invariant relationships between the set of time series, and discovering sequential message patterns and association rules in the plurality of heterogeneous logs using the invariant relationships. The method also includes executing (340), by a processor, a set of log management applications, based on the sequential message patterns and the association rules.
Abstract:
A method and system for coordinating energy management in a virtualized data center including a plurality of physical servers and a plurality of virtual machines (VMs), includes analyzing status information about the virtualized data center; determining server utilization target settings for server consolidation from the analyzed status information; and executing the server consolidation according to the determined server utilization target settings. Server consolidation can be executed by determining an effective size of each of the VMs and placing the VMs on the servers in a selective manner using an independent workload VM placement process, a correlation-aware VM placement process, or a migration-cost and correlation-aware VM placement process.
Abstract:
A method system for diagnosing a detected failure in a computer system, compares a failure signature of the detected failure to an archived failure signature contained in a database to determine if the archived failure signature matches the failure signature of the detected failure. If the archived failure signature matches the failure signature of the detected failure, an archived solution is applied to the computer system that resolves the detected failure, the archived solution corresponding to a solution used to resolve a previously detected computer system failure corresponding to the archived failure signature in the database that matches the detected failure.