Abstract:
A coherent memory fabric includes a plurality of coherent master controllers and a coherent slave controller. The plurality of coherent master controllers each include a response data buffer. The coherent slave controller is coupled to the plurality of coherent master controllers. The coherent slave controller, responsive to determining a selected coherent block read command is guaranteed to have only one data response, sends a target request globally ordered message to the selected coherent master controller and transmits responsive data. The selected coherent master controller, responsive to receiving the target request globally ordered message, blocks any coherent probes to an address associated with the selected coherent block read command until receipt of the responsive data is acknowledged by a requesting client.
Abstract:
Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.
Abstract:
Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
Abstract:
Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.
Abstract:
Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.
Abstract:
A method and device are described for encoding erroneous data in an error correction code (ECC) protected memory. In one embodiment, incoming data including a plurality of data symbols and a data integrity marker is received. At least one extra symbol is used to mark the incoming data as error-free data or erroneous data (i.e., poison) based on the data integrity marker. ECC may be created to protect the data symbols. The ECC may include a plurality of check symbols, a plurality of unused symbols and the at least one extra symbol. In another embodiment, an error marker may be propagated from a single ECC word to all ECC words of data block (e.g., a cache line, a page, and the like) to prevent errors due to corruption of the error marker caused by faulty memory in the erroneous ECC word.
Abstract:
Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
Abstract:
A data fabric routes requests between the plurality of requestors and the plurality of responders. The data fabric includes a crossbar router, a coherent slave controller coupled to the crossbar router, and a probe filter coupled to the coherent slave controller and tracking the state of cached lines of memory. Power state control circuitry operates, responsive to detecting any of a plurality of designated conditions, to cause the probe filter to enter a retention low power state in which a clock signal to the probe filter is gated while power is maintained to the probe filter. Entering the retention low power state is performed when all in-process probe filter lookups are complete.
Abstract:
A data fabric routes requests between the plurality of requestors. A probe filter tracks the state of cached lines of memory at a probe filter coupled to the data fabric. Responsive to the data fabric leaving a non-operational power state while all requestors that are probe filter clients are in a non-operational power state, the power management controller delays a probe filter initialization state in which data regarding cached lines is initialized following the non-operational power state.
Abstract:
A data fabric routes requests between the plurality of requestors and the plurality of responders. The data fabric includes a crossbar router, a coherent slave controller coupled to the crossbar router, and a probe filter coupled to the coherent slave controller and tracking the state of cached lines of memory. Power state control circuitry operates, responsive to detecting any of a plurality of designated conditions, to cause the probe filter to enter a retention low power state in which a clock signal to the probe filter is gated while power is maintained to the probe filter. Entering the retention low power state is performed when all in-process probe filter lookups are complete.