Abstract:
Methods and apparatus to manage bypassing of a first cache are disclosed. In one such method, a load instruction having an expected latency greater than or equal to a predetermined threshold is identified. A request is then made to schedule the identified load instruction to have a perdetermined latency. The software program is then scheduled. An actual latency associated with the load instruction in the scheduled software program is then compared to the predetermined latency. If the actual latency is greater than or equal to the predetermined latency, the load instruction is marked to bypass the first cache.
Abstract:
Embodiments of apparatus and methods for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.
Abstract:
Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
Abstract:
In one embodiment, the present invention includes a prefetching engine to detect when data access strides in a memory fall into a range, to compute a predicted next stride, to selectively prefetch a cache line using the predicted next stride, and to dynamically control prefetching. Other embodiments are also described and claimed.
Abstract:
An arrangement is provided for data value recovery in an optimized program by precisely allocating predicate registers to guard branching instructions in the optimized program at compilation time. At execution time, an execution path leading to a recovery point is determined based on values of predicate registers guarding branching blocks. The values of non-current and non-resident data may be recovered at the recovery point according to the determined execution path. Optimization annotations may also be utilized for data value recovery.
Abstract:
A method and apparatus for providing compiler transformation of code using regions with simplified data and control flow and value specialization are described. In one embodiment, the method includes identifying in the code a plurality of potential candidates for value specialization, selecting a group of candidates from the plurality of potential candidates based on a value profile associated with each potential candidate, and determining specialized data for each selected candidate using a corresponding value profile. The method further includes forming a plurality of optimized regions based on corresponding specialized data. Each optimized region includes one or more selected candidates.
Abstract:
An apparatus and method is described herein for conditionally committing /andor speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.
Abstract:
Systeme, Vorrichtungen und Verfahren für ein Hardware- und Softwaresystem zum automatischen Zerlegen eines Programms in mehrere parallele Threads werden beschrieben. In einigen Ausführungsformen führen die Systeme und Vorrichtungen ein Verfahren zum Zerlegen eines ursprünglichen Codes und/oder einer generierten Thread-Ausführung aus.
Abstract:
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
Abstract:
In one embodiment, the present invention includes a prefetching engine to detect when data access strides in a memory fall into a range, to compute a predicted next stride, to selectively prefetch a cache line using the predicted next stride, and to dynamically control prefetching. Other embodiments are also described and claimed.