Abstract:
A processor performance profiler is enabled to for identify specific instructions causing performance issues within a program being executed by a microprocessor through random sampling to find the worst-case offenders of a particular event type such as a cache miss or a branch mis-prediction. Tracking all instructions causing a particular event generates large data logs, creates performance penalties, and makes code analysis more difficult. However, by identifying and tracking the worst offenders within a random sample of events without having to hash all events results in smaller memory requirements for the performance profiler, lower performance impact while profiling, and decreased complexity to analyze the program to identify major performance issues, which, in turn, enables better optimization of the program in shorter developer time.
Abstract:
Embodiments relate to tracking cache lines. An aspect of embodiments includes performing an operation by a processor. Another aspect of embodiments includes fetching a cache line based on the operation. Yet another aspect of embodiments includes storing in an instruction address register file at least one of (i) an operation identifier identifying the operation and (ii) a memory location identifier identifying a level of memory from which the cache line is populated.
Abstract:
The current application is directed to architected hardware support within computer processors for detecting and monitoring various types of potential performance imbalances with respect to simultaneously executing hardware threads in simultaneous multi-threading (“SMT”) processors and SMT-processor cores. The architected hardware support may include various types of performance-imbalance-monitoring registers that accumulate indications of performance imbalances and that can be used, by performance-monitoring software and by human analysts to detect performance-degrading conflicts between simultaneously executing hardware threads. Such conflicts can be ameliorated by changing the scheduling of virtual machines, tasks, and other computational entities, by redesigning and re-implementing all or portions of performance-limited and performance-degrading applications, by altering resource-allocation strategies, and by other means. In addition, performance imbalance detection and monitoring can be used to provide accurate, computational-throughput-based accounting in cloud-computing environments.
Abstract:
A method for configuring a large hybrid memory subsystem having a large cache size in a computing system where one or more performance metrics of the computing system are expressed as an explicit function of configuration parameters of the memory subsystem and workload parameters of the memory subsystem. The computing system hosts applications that utilize the memory subsystem, and the performance metrics cover the use of the memory subsystem by the applications. A performance goal containing values for the performance metric is identified for the computing system. These values for the performance metrics are used in the explicit function of performance metrics, configuration parameters and workload parameters to calculate values for the configuration parameters that achieve the identified performance goal. The calculated values of the configuration parameters are implemented in the memory subsystem.
Abstract:
Systems and methods for run-time switching for simulation with dynamic run-time accuracy adjustment. In one embodiment, a computer implemented method performs a simulation of a computer instruction executing on a simulated hardware design by a first simulation model, wherein the first simulation model provides first timing information of the simulation. The first timing information is stored to a computer usable media. A pending subsequent simulation of the instruction is detected. Responsive to the presence of the first timing information in the computer usable media, the computer instruction is simulated by a second simulation model, wherein the second simulation model provides less accurate second timing information of the simulation than the first simulation model. The simulation run time information is updated for the subsequent simulation with the first timing information.
Abstract:
Systems and methods for cache optimization, the method comprising monitoring cache access rate for one or more cache tenants in a computing environment, wherein a first cache tenant is allocated a first cache having a first cache size which may be adjusted; determining a cache profile for at least the first cache over one or more time intervals according to data collected during the monitoring, analyzing the cache profile for the first cache to determine an expected cache usage model for the first cache; and analyzing the cache usage model and factors related to cache efficiency for the one or more cache tenants to dictate one or more constraints that define boundaries for the first cache size.
Abstract:
A technique includes providing data indicative of a counted value acquired by a hardware counter of a processing core during a time segment in which a plurality of tasks are active on the core and, in a processor-based machine, determining a likelihood that the counted value is attributable to a given task of the tasks during the time segment and attributing a portion of the counted value to the given task based at least in part on the determined likelihood.
Abstract:
A method and system for the management of tracing data of interest in a data processing system comprises identifying the location and length of one or more such units of interest as each unit is stored in main memory during execution of the program and recording a logical assignment of each unit of interest to a slot in a wrap around trace buffer. Copying of the units of interest to the trace buffer is deferred unless one or more predefined events occur. Such events may include an attempt to overwrite the data which has been logically assigned or a request for information stored in the trace buffer. The recorded assignments are discarded whenever it is calculated that the capacity of the trace buffer would be exceeded resulting in the corresponding units never needing to be copied to the trace buffer.
Abstract:
Embodiments of the present invention provide an adaptive cache system and an adaptive cache system for a hybrid storage system. Specifically, in a typical embodiment, an input/out (I/O) traffic analysis component is provided for monitoring data traffic and providing a traffic analysis based thereon. An adaptive cache algorithm component is coupled to the I/O traffic analysis component for applying a set of algorithms to determine a storage schema for handling the data traffic. Further, an adaptive cache policy component is coupled to the adaptive cache algorithm component.
Abstract:
Techniques for multi-core resource utilization planning are provided. An agent is deployed on each core of a multi-core machine. The agents cooperate to perform one or more tests. The tests result in measurements for performance and thermal characteristics of each core and each communication fabric between the cores. The measurements are organized in a resource utilization map and the map is used to make decisions regarding core assignments for resources.