Abstract:
A dynamic performance profiler is operable to receive, in substantially real-time, raw performance data from a testing platform. A software-based image is executing on a target hardware platform (e.g., either simulated or actual) on the testing platform, and the testing platform monitors such execution to generate corresponding raw performance data, which is communicated, in substantially real-time, as it is generated during execution of the software-based image to a dynamic profiler. The dynamic profiler may be configured to archive select portions of the received raw performance data to data storage. As the raw performance data is received, the dynamic profiler analyzes the data to determine whether the performance of the software-based image on the target hardware platform violates a predefined performance constraint. When the performance constraint is violated, the dynamic profiler archives a portion of the received raw performance.
Abstract:
A system for, method of and computer program product captures performance- characteristic data from the execution of a program and models system performance based on that data. Performance-characterization data based on easily captured reuse distance metrics is targeted, defined as the total number of memory references between two accesses to the same piece of data. Methods for efficiently capturing this kind of metrics are described. These data can be refined into easily interpreted performance metrics, such as performance data related to caches with LRU replacement and random replacement strategies in combination with fully associative as well as limited associativity cache organizations. Methods for assessing cache utilization as well as parallel execution are covered.
Abstract:
A method and system for metering and analyzing usage and performance data of virtualized compute and network infrastructures is disclosed. The processing functions of the metered data are divided into 'processing units' that are configured to execute on a server (or plurality of interconnected servers). Each processing unit receives input from an upstream processing unit, and processes the metered data to produce output for a downstream processing unit. The types of processing units, as well as the order of the processing units is user-configurable (e.g. via XML file), thus eliminating the need to modify source code of the data processing application itself, thereby saving considerable time, money, and development resources required to manage the virtualized compute and network infrastructure.
Abstract:
A system and method for optimizing system performance includes applying (160) sampling based optimization to identify optimal configurations of a computing system by selecting (162) a number of configuration samples and evaluating (166) system performance based on the samples. Based on feedback of evaluated samples, a location of an optimal configuration is inferred (170). Additional samples are generated (176) towards the location of the inferred optimal configuration to further optimize a system configuration.
Abstract:
A system for, method of and computer program product captures performance characteristic data from the execution of a program and models system performance based on that data. Performance-characterization data (140) based on easily captured reuse distance metric is targeted, defined as the total number of memory references (230) between two accesses to the same piece of data (240). Methods for efficiently capturing this kind of metrics are described. These data can be refined into easily interpreted performance metrics, such as performance data related to caches with LRU replacement and random replacement strategies in combination with fully associative as well as limited associativity cache organaziations (330). Methods for assesing cache utilization as well as parallel execution are covered.
Abstract:
One embodiment includes a system for application-layer monitoring of communication between one or more database clients and one or more database servers. The system includes one or more decoders residing at a decoding layer above a network layer. The decoders reside at a first network location between one or more database clients residing at one or more second network locations and one or more database servers residing at one or more third network locations. The decoders receive database messages communicated from the database clients and intended for the database servers and database messages communicated from the database servers and intended for the database clients, decode the database messages, and extract query-language statements from the database messages. The system also includes a monitoring application residing at an application layer above the decoding layer. The monitoring application resides at the first network location. The monitoring application receives query-language statements extracted at the decoders and records observations on the database messages based on the query-language statements extracted at the decoders.
Abstract:
Techniques are described for optimizing memory management in a processor system. The techniques may be implemented on processors that include on-chip performance monitoring and on systems where an external performance monitor is coupled to a processor. Processors that include a Performance Monitoring Unit (PMU) are examples. The PMU may store data on read and write cache misses, as well as data on translation lookaside buffer (TLB) misses. The data from the PMU is used to determine if any memory regions within a memory heap are delinquent memory regions, i.e., regions exhibiting high numbers of memory problems or stalls. If delinquent memory regions are found, the memory manager, such as a garbage collection routine, can efficiently optimize memory performance as well as the mutators performance by improving the layout of objects in the heap. In this way, memory management routines may be focused based on dynamic and real-time memory performance data.
Abstract:
A memory module includes a memory hub coupled to several memory devices. The memory hub includes at least one performance counter that tracks one or more system metrics-for example, page hit rate, number or percentage of prefetch hits, cache hit rate or percentage, read rate, number of read requests, write rate, number of write requests, rate or percentage of memory bus utilization, local hub request rate or number, and/or remote hub request rate or number.
Abstract:
An embedded system includes a microprocessor and performance measuring logic coupled to the microprocessor and configured to record selected performance metrics. The performance metrics may be one or more of the following exemplary metrics: overall execution time of a particular routine, number of instruction cycles executed in the particular routine, number of cache hits in the given routine; total number of memory reads in the given routine, total number of memory accesses (reads and writes) in the given routine, number of control bus read cycles in the given routine, number of control bus cycles (reads and writes) in the given routine, number of non-cacheable read cycles in the given routine, and total number of non-cacheable access cycles (reads and writes) in the given routine. In general, a counter is configured to record statistics for each of the performance metrics, and the counters may be controlled using a programmable mask, which is included in a memory coupled to the microprocessor. Based on these metrics, designers may fine-tune software for the embedded system.
Abstract:
A performance monitor system includes a core processor (115), a core processor associated device, such as a cache (123), and first logic, such as performance logic (127). The core processor (115) is operable to execute information. The core processor associated device provides a first signal (CACHE_PERF), which defines performance of the core processor associated device (123) during operation of the core processor (115). The first logic (127) is coupled to the core processor associated device (123) and monitors the first signal (CACHE_PERF) in response to a second signal (WPT0,1), which defines a match of user-settable attributes associated with the operation of the core processor (115).