Abstract:
Provided is a method, system, and program for coordinating access to memory locations for hardware transactional memory transactions and software transactional memory transactions. A hardware transaction executing in hardware transactional memory initiates a request to access a memory location. A fault is returned to the hardware transaction request in response to an operation by one software transaction executing in a software transactional memory.
Abstract:
Embodiments of the present invention relate to a system and method for comparatively increasing processor throughput and relieving pressure on the processor's scheduler and register file by diverting instructions dependent on long-latency operations from a flow of the processor pipeline and re-introducing them into the flow when the long-latency operations are completed. In this way, the instructions do not tie up resources and overall instruction throughput in the pipeline is comparatively increased.
Abstract:
A device is presented including a first processor and a second processor. A number of memory devices are connected to the first processor and the second processor. A register buffer is connected to the first processor and the second processor. A trace buffer is connected to the first processor and the second processor. A number of memory instruction buffers are connected to the first processor and the second processor. The first processor and the second processor perform single threaded applications using multithreading resources. A method is also presented where a first thread is executed from a second processor. The first thread is also executed from a second processor as directed by the first processor. The second processor executes instructions ahead of the first processor.
Abstract:
In one embodiment of the invention, a processor includes a memory order buffer (MOB) (178) including load buffers (182) and store buffers (184), wherein the MOB orders load and store instructions so as to maintain data coherency between load and store instructions in different threads, wherein at least one of the threads is dependent on at least another one of the threads. In another embodiment of the invention, a processor includes an execution pipeline to concurrently execute at least portions of threads, wherein at least one of the threads is dependent on at least another one of the threads, the execution pipeline including a memory order buffer that orders load and store instructions. The processor also includes detection circuitry to detect speculation errors associated with load instructions in a load buffer.
Abstract:
In one embodiment of the invention, a processor (50) includes an execution pipeline (108) to concurrently execute at least portions of threads T1-T4, wherein at least one of the threads is dependant on at least another one of the threads. The processor (50) also includes detection circuitry to detect speculation errors in the execution of the threads T1-T4. In another embodiment, the processor (50) includes thread management logic (124) to control dynamic creation of threads from a program (112A).
Abstract:
In one embodiment of the invention, a processor (10) includes and execution pipeline to execute instructions, wherein at least some of the instructions are executed speculatively. The processor also includes a trace buffer (114) outside the execution pipeline to hold the instructions, and wherein instructions that are associated with speculation errors are replayed in the execution pipeline to execute instructions, wherein at least some of the instructions are executed speculatively. The processor also includes a trace buffer outside the execution pipeline to hold instructions and results of the execution of the instructions, wherein at least some of the instructions are subject to an initial retirement following execution in the pipeline, but remain in the trace buffer until a final retirement (134).
Abstract:
Methods and apparatus to provide transactional memory execution in out-of-order processors are described. In one embodiment, a stored value corresponds to the number of transactional memory access requests that are uncommitted. The stored value may be used to provide nested recovery in case of an error, fault, etc. in accordance with a described embodiment.
Abstract:
A device is presented including a first processor and a second processor. A number of memory devices are connected to the first processor and the second processor. A register buffer is connected to the first processor and the second processor. A trace buffer is connected to the first processor and the second processor. A number of memory instruction buffers are connected to the first processor and the second processor. The first processor and the second processor perform single threaded applications using multithreading resources. A method is also presented where a first thread is executed from a first processor. The first thread is also executed from a second processor as directed by the first processor. The second processor executes instructions ahead of the first processor.
Abstract:
A device is presented including a first processor and a second processor. A number of memory devices are connected to the first processor and the second processor. A register buffer is connected to the first processor and the second processor. A trace buffer is connected to the first processor and the second processor. A number of memory instruction buffers are connected to the first processor and the second processor. The first processor and the second processor perform single threaded applications using multithreading resources. A method is also presented where a first thread is executed from a second processor. The first thread is also executed from a second processor as directed by the first processor. The second processor executes instructions ahead of the first processor.
Abstract:
In one embodiment, a processor includes thread management logic including a thread predictor having state machines to indicate whether thread creation opportunities should be taken or not taken. The processor includes a predictor training mechanism to receive retired instructions and to identify potential threads from the retired instructions and to determine whether a potential thread of interest meets a test of thread goodness, and if the test is met, one of the state machines that is associated with the potential thread of interest is updated in a take direction, and if the test is not met, the state machine is updated in a not take direction. The thread management logic may control creation of an actual thread and may further include reset logic to control whether the actual thread is reset and wherein if the actual thread is reset, one of the state machines associated with the actual thread is updated in a not take direction. The final retirement logic may control whether the actual thread is retired, and wherein if the actual thread is retired, the state machine associated with the actual thread is updated in a take direction. The circuitry may be used in connection with a multi-threading processor that detects speculation errors involving thread dependencies in execution of the actual threads and re-executes instructions associated with the speculation errors from trace buffers outside an execution pipeline.