Abstract:
In accordance with methods and systems consistent with the present invention, an improved processor performance instrumentation system is provided that allows a software tester to measure more performance indicators and there are hardware counters during a single execution of a tested program. The improved processor performance instrumentation system accomplishes this by "multiplexing" performance indicators while executing the tested program. In effect, methods and systems consistent with the present invention extend the abilities of the limited number of hardware counters to allow them to measure a number of performance indicators otherwise not allowed during one execution of the tested program.
Abstract:
The present invention discloses a processor system comprising a processor (31) and at least a first memory (32) and a second memory (34, 36, 37). The first memory (32) is normally faster than the second one, and means for memory allocation (38, 41, 48) perform the periodically static allocation of data into the first memory (32). The means for memory allocation (38, 41, 48) are run-time updateable by software. An execution profiling section (39) is provided for continuously or intermittently providing execution data used for updating the means for memory allocation (38, 41, 48). According to the invention, the memory allocation is performed on a variable or record (49, 50) level. The means for memory allocation preferably use linking tables (41, 48) supporting dynamic software changes. The first memory (32) is preferably an SRAM, connected to the processor by a dedicated bus (33).
Abstract:
A method of selecting a cache design for a computer system begins with the making (S11) of a prototype module with a processor (15), a "seed" cache (17), and a trace detection module (31). The prototype module can be inserted within a system that includes main memory (23) and peripherals (25). While an application program is run (S21) on the system, the communications between the processor and the seed cache are detected (S22) and compressed (S23). The compressed detections are stored (S24) in a trace capture module and collectively define a trace of the program on the prototype module. Then trace is then expanded (S31) and used to evaluate (S32) a candidate cache design. The expansion and evaluation can be iterated to evaluate many cache designs. The method can be used to pick the cache design with the best performance or as a foundation for performing a cost/performance comparison of the evaluated caches. In this method, a single prototype is used to generate an accurate trace that permits many alternative cache designs to be evaluated. This contrasts with methods that use cacheless models to develop less accurate traces and methods that allow only one cache design to be evaluated per prototype. In summary, the invention provides an accurate and efficient method of evaluating alternative cache designs.
Abstract:
Load balancing of activities on physical disk storage devices is accomplished by monitoring reading and writing operations to blocks of contiguous storage locations on the physical disk storage devices. A list of exchangeable pairs of blocks is developed based on size and function. Statistics accumulated over an interval are then used to obtain access activity values for each block and each physical disk drive. A statistical analysis leads to a selection of one block pair. After testing to determine any adverse effect of making that change, the exchange is made to more evenly distribute the loading on individual physical disk storage devices.
Abstract:
A processor-based device (102) incorporating an on-chip trace cache (200) and supporting circuitry for providing software performance profiling information. A trigger control register (219a) is configured to initialize and trigger (start) a first on-chip counter upon entry into a selected procedure. A second trigger control register (219b) is used to stop the first counter when the procedure prologue of the selected procedure is entered. Counter values reflecting the lapsed execution time of the selected procedure are then stored in the on-chip trace cache. A second counter is also provided. The second counter runs continually, but is reset to zero following a stop trigger event caused by the second trigger control register. The stop trigger event also causes the value of the second counter to be placed in the on-chip trace cache (200). This second counter value provides the frequency of occurence of a procedure of interest, whereas the first counter provides information about the procedure's execution time. Either post-processing software executing on a target system, a host system utilizing a debug port, or off-chip trace capture hardware can be used to analyze the profile data.
Abstract:
A processor-based device (102) incorporating an on-chip instruction trace cache (200) capable of providing information for reconstructing instruction execution flow. The trace information can be captured without halting normal processor (104) operation. Both serial (204) and parallel (214) communication channels are provided for communicating the trace information to external devices. In the disclosed embodiment of the invention, instructions that disrupt the instruction flow are reported, particularly instructions in which the target address is in some way data dependent. For example, call instructions or unconditional branch instructions in which the target address is provided from a data register (or other memory location such as a stack) cause a trace cache entry to be generated. In the case of many unconditional branches or sequential instructions, no entry is placed into the trace cache (200) because the target address can be completely determined from the instruction stream. Other information provided by the instructiontrace cache (200) includes: the target address of a trap or interrupt handler, the target address of a return instruction, addresses from procedure returns, task identifiers, and trace capture stop/start information.
Abstract:
A performance enhancement system including a software implemented in a Performance Assistant File System (PAFS, 52) to enable an efficient and economical performance evaluation and tuning of data servers (58, 60, 62, 64) and networks is disclosed. The PAFS of the present invention is a subcomponent of a performance assistant (PA, 72, 74) architecture which includes a set of powerful software tools (54) that help the user/system administrator tune a computer system to yield maximum performance. Specifically, the present invention enables the user to diagnose and tune operating systems without the need for utilizing laboratory conditions and tools such that the full potential of the processors architecture could be realized.
Abstract:
Various examples described herein provide for deferred write back based on age time. According to some examples, an age time for a cached instance stored on a data cache is monitored and, based on the age time, a cache table entry for the cached instance may be modified to indicate that the cached instance is a candidate for a deferred write back period. A controller may monitor for a deferred write back period based on data activity of the data cache. During a deferred write back period, the cached instance may be written back from volatile memory to the non-volatile memory based on whether the cache table entry indicates that the cached instance has been modified and based on whether the cache table entry indicates that the cached instance is a candidate for the deferred write back period.
Abstract:
Aspects include computing devices, systems, and methods for implementing monitoring communications between components and a memory hierarchy of a computing device. The computing device may determine at least one identifying factor for identifying execution of the processor-executable code. A communication between the components and the memory hierarchy of the computing device may be monitored for at least one communication factor of a same type as the at least one identifying factor. A determination whether a value of the at least one identifying factor matches a value of the at least one communication factor may be made. The computing device may determine that the processor-executable code is executed in response to determining that the value of the at least one identifying factor matches the value of the at least one communication factor.
Abstract:
A block-based storage system may implement page cache write logging. Write requests for a data volume maintained at a storage node may be received at a storage node. A page cache for may be updated in accordance with the request. A log record describing the page cache update may be stored in a page cache write log maintained in a persistent storage device. Once the write request is performed in the page cache and recorded in a log record in the page cache write log, the write request may be acknowledged. Upon recovery from a system failure where data in the page cache is lost, log records in the page cache write log may be replayed to restore to the page cache a state of the page cache prior to the system failure.