Abstract:
A method and apparatus for extending cache coherency to hold buffered data to support transactional execution is herein described. A transactional store operation referencing an address associated with a data item is performed in a buffered manner. Here, the coherency state associated with cache lines to hold the data item are transitioned to a buffered state. In response to local requests for the buffered data item, the data item is provided to ensure internal transactional sequential ordering. However, in response to external access requests, a miss response is provided to ensure the transactionally updated data item is not made globally visible until commit. Upon commit, the buffered lines are transitioned to a modified state to make the data item globally visible.
Abstract:
A method and apparatus for metaphysical address space for holding lossy metadata is herein described. An explicit or implicit metadata access operation referencing data address of a data item is encountered. Hardware modifies the data address to a metadata address including a metaphysical extension. The metaphysical extension overlays one or more metaphysical address space(s) on the data address space. A portion of the metadata address including the metaphysical extension is utilized to search a tag array of the cache memory holding the data item. As a result, metadata access operations only hit metadata entries of the cache based on the metadata address extension. However, as the metadata is held within the cache, the metadata potentially competes with data for space within the cache.
Abstract:
Embodiments of apparatus and methods for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.
Abstract:
A method and system to reduce the power consumption of a memory device. In one embodiment of the invention, the memory device is a N-way set-associative level one (L1) cache memory and there is logic coupled with the data cache memory to facilitate access to only part of the N-ways of the N-way set-associative L1 cache memory in response to a load instruction or a store instruction. By reducing the number of ways to access the N-way set-associative L1 cache memory for each load or store request, the power requirements of the N-way set-associative L1 cache memory is reduced in one embodiment of the invention. In one embodiment of the invention, when a prediction is made that the accesses to cache memory only requires the data arrays of the N-way set-associative L1 cache memory, the access to the fill buffers are deactivated or disabled.
Abstract:
A method and apparatus for monitoring memory accesses in hardware to support transactional execution is herein described. Attributes are monitor accesses to data items without regard for detection at physical storage structure granularity, but rather ensuring monitoring at least at data items granularity. As an example, attributes are added to state bits of a cache to enable new cache coherency states. Upon a monitored memory access to a data item, which may be selectively determined, coherency states associated with the data item are updated to a monitored state. As a result, invalidating requests to the data item are detected through combination of the request type and the monitored coherency state of the data item.
Abstract:
A processor may support a two-dimensional (2-D) gather instruction and a 2-D cache. The processor may perform the 2-D gather instruction to access one or more sub-blocks of data from a two-dimensional (2-D) image stored in a memory coupled to the processor. The two-dimensional (2-D) cache may store the sub-blocks of data in a multiple cache lines. Further, the 2-D cache may support access of more than one cache lines while preserving a two-dimensional structure of the 2-D image.
Abstract:
A method and apparatus for providing a memory model for hardware attributes to support transactional execution is herein described. Upon encountering a load of a hardware attribute, such as a test monitor operation to load a read monitor, write monitor, or buffering attribute, a fault is issued in response to a loss field indicating the hardware attribute has been lost. Furthermore, dependency actions, such as blocking and forwarding, are provided for the attribute access operations based on address dependency and access type dependency. As a result, different scenarios for attribute loss and testing thereof are allowed and restricted in a memory model.
Abstract:
A method and system to reduce the power consumption of a memory device. In one embodiment of the invention, the memory device is a N-way set-associative level one (L1) cache memory and there is logic coupled with the data cache memory to facilitate access to only part of the N-ways of the N-way set-associative L1 cache memory in response to a load instruction or a store instruction. By reducing the number of ways to access the N-way set-associative L1 cache memory for each load or store request, the power requirements of the N-way set-associative L1 cache memory is reduced in one embodiment of the invention. In one embodiment of the invention, when a prediction is made that the accesses to cache memory only requires the data arrays of the N-way set-associative L1 cache memory, the access to the fill buffers are deactivated or disabled.